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Introductory Remarks 


After spending some thirty years studying 
human intelligence, I decided to write a 
book. Why? It must be that I have some¬ 
thing new to say. 

Well, I do and I do not. Virtually every¬ 
thing that could be said about human intel¬ 
ligence has been said. It is or isn’t impor¬ 
tant, it is or isn't evaluated by the tests, it 
is or isn't genetically based, and on and on. 
This book does not say anything that has 
not already been said. It could not. What 
it does do is attempt to bring up to date 
the information on the various conflicting 
views. My contribution is to moderate these 
views, because I do not think that any of 
the extreme statements that have been made 
can be supported. 

There is a saying that has been traced back 
to the days of classic Greece; 

The fox knows many little things; the hedge¬ 
hog knows one big thing. 

It is not entirely clear what that means, 
but some philosophers have interpreted it as 
saying that some people summarize issues 
with detailed, nuanced views, while others 
make bold, simple statements. I am a fox. I 


think the field of human intelligence has had 
far too many hedgehogs. There are major 
individual differences in cognitive power; 
these differences have important implica¬ 
tions for human behavior; they do not have 
a single cause, nor do they ever act outside 
of the context of the current problem. We 
need to understand intelligence in its full 
complexity. 

Intellectual foxes have a problem. They 
are more likely to be right than intellectual 
hedgehogs (there is actually data on this!), 
but they are less likely to be believed (there 
is data on this, too). Nevertheless, being a 
fox, there is nothing I can do but try to locate 
the burrows of as many intellectual hedge¬ 
hogs as I can, and try to dig them out. It is my 
nature, and that is what I have tried to do. 
Complete intellectual objectivity is impossi¬ 
ble to achieve. I have tried to present as fair 
a picture as I can of a much-studied, much- 
debated topic. The result is a book that may 
sometimes be difficult to read, but I hope 
that it is a comprehensive presentation. 

Any effort of this sort is impossible unless 
you receive support. My first and greatest 
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INTRODUCTORY REMARKS 


debt is to my wife, Mary Lou Hunt, who 
has put up with years of papers scattered all 
over the house, a somewhat grumpy hus¬ 
band, and mutterings as I uncovered the 
tracks of one or another of those intellec¬ 
tual hedgehogs. 

My second debt is to Cambridge Univer¬ 
sity Press, which put up with my being late, 
late, late, but let me persevere. I also owe 
a special debt to Jeanie Lee, for her sub¬ 
stantial assistance in ensuring that permis¬ 
sions for reproduction were obtained. Too 
many books on intelligence wave words at 
the reader about what the data said. Thanks 
in no small part to Ms. Lee’s assistance, this 
book will often let the reader see what actu¬ 
ally was found. 

I owe favors to colleagues around the 
world who were willing to read pre¬ 
publication versions. Special thanks go to 


Tom Bouchard Jr., who engaged me in lively 
e-mail discussions over virtually every chap¬ 
ter; to two of my sons, Alan and Steven, who 
discussed and commented on different top¬ 
ics (very different - Alan’s a biophysicist and 
Steve an industrial-organizational psycholo¬ 
gist); to Wendy Johnson of the University 
of Edinburgh for her comments on genet¬ 
ics; and to Diane Halpern, for comments 
on the introductory chapters. Naturally, I 
am responsible for everything in the final 
product! 

And I suppose I owe an apology to all 
those authors whose works I should have 
read but did not. All 1 can say is that 
life is short and there are an awful lot of 
you. 

Bellevue, Washington 
April 2010 
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CHAPTER 1 


The Issue of Intelligence 


So it is that gods do not give all men the 
gifts of grace . . . neither good looks nor 
intelligence nor eloquence. 

Homer, The Odyssey 

There’s many a man has more hair than wit. 

Shakespeare, Comedy of Errors, 
act 2, scene 3 

1.1. The Idea of Intelligence 

Homer and Shakespeare lived in very dif¬ 
ferent times, more than two thousand years 
apart, but they both captured the same idea; 
we are not all equally intelligent. I suspect 
that anyone who has failed to notice this 
is somewhat out of touch with the species. 
However, we cannot simply sort people into 
the “intelligent’' and the “not-so-intelligent.” 
Homer observed that few people have great 
gifts. Shakespeare, more pithily, observed 
that all too many of us do not do terribly 
well at problem solving. Most of us, though, 
fall in between Homer’s desire for eloquence 
and Shakespeare’s worry about lack of wit. 


In this book I will talk about the nature 
of intelligence, its causes, who has it, and 
how it is used. I will do so without the elo¬ 
quence of Homer and Shakespeare. I will 
take a scientific view. Modern psychology 
has a great deal to say about intelligence, and 
somehow a great deal that has been said has 
been seriously misunderstood. The popular 
media sometimes report that the psycholo¬ 
gists who study intelligence say almost the 
opposite of what the psychologists actually 
said. 1 

There is a reason for this. The study 
of intelligence is not an isolated academic 
topic; our intelligence has social conse¬ 
quences. We want our leaders to be intel¬ 
ligent, and exhibit concern if we think they 
are not. There were politically motivated 
attacks on the intelligence of Presidents 
Lincoln, Truman, Harding, and Ford. Seri¬ 
ous concerns about mental competence 

1 See Tannenbaum (1996) for a discussion of this issue 
and references to earlier discussions of the topic. 
Gottfredson (2005] provides a spirited discussion 
of how failing to consider the implications of psy¬ 
chological research on intelligence can be costly to 
society. 
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were raised about Wilson and Eisenhower, 
following strokes, and Reagan, due to early 
symptoms of Alzheimer’s disease. Lincoln 
and Truman, who received the most vicious 
attacks, are now considered two of our finest 
presidents. Eisenhower recovered to func¬ 
tion well; Wilson did not. The effect of 
Reagan’s illness upon his second term is still 
a matter of debate. 

Concerns about intelligence are not con¬ 
fined to concerns about our leaders. Our 
school systems use cognitive tests to stream 
high school students into different pro¬ 
grams. Colleges use cognitive tests to screen 
applicants for higher education programs. 
These tests are never called “intelligence 
tests,” but they correlate highly with them. 

Testing is not confined to the educational 
system. Volunteers for the United States 
military services must obtain passing scores 
on a test of general mental competence, 
the Armed Forces Qualifying Test (AFQT). 
Similar tests are used in many other coun¬ 
tries. Toward the bottom end of the scale 
there are a variety of special assistance pro¬ 
grams for people who simply do not have the 
cognitive competence to cope with the com¬ 
plexities of the modern world. Low intelli¬ 
gence test scores can be offered as evidence 
of diminished mental capacity during the 
penalty phase of a criminal trial. 

While there is broad agreement that 
some people are smarter than others, things 
become more complex when we try to be 
precise. I think that every knowledgeable 
person would agree that Albert Einstein and 
Thomas Jefferson were both highly intelli¬ 
gent. Who was the more intelligent? That 
is hard to say; they were brilliant in differ¬ 
ent ways at different times. It would be easy 
to find other examples of the same point. 
There are clearly varieties of cognitive skill, 
especially at the top. As a result some mod¬ 
ern observers have concluded that there is 
no single dimension of intelligence. 

This idea is not new. In the sixteenth 
century the Spanish physician/philosopher 
Juan Huarte de San Juan 2 drew a remarkably 
cogent picture of individual differences in 

2 Huarte de San Juan, 1575/1991. 


human thought. Huarte believed that when 
people attack problems some will use their 
imaginations to envisage how a solution 
might work out, while others will rely 
on their memories of solutions that have 
worked in the past. Huarte also defined 
“understanding” ( entendimiento ) as a sep¬ 
arate capacity, implying that one can be 
bright without having a good understanding 
of a situation. Huarte’s distinction between 
problem solving by imagination or by mem¬ 
ory is mirrored in contemporary theo¬ 
ries that distinguish between the ability to 
do abstract reasoning and the ability to 
apply previously learned solution methods. 3 
Robert Sternberg, a prolific modern writer 
on intelligence, has emphasized the dis¬ 
tinction between analytic intelligence and 
the ability to understand complex social 
situations. 4 

Huarte anticipated another modern idea, 
the need to have a biological explanation for 
intelligence. Huarte offered a theory based 
on the sixteenth-century notion that the 
body is governed by four “humors” - blood, 
bile, black bile, and phlegm. This theory of 
biology has long since been discarded. The 
idea that there should be a biological expla¬ 
nation for individual variations in cognition 
has been retained. One of the most active 
areas of modern intelligence research deals 
with the relation between intelligent behav¬ 
ior and the brain. 

Let us leap from the sixteenth century 
to the nineteenth, and to one of the most 
colorful characters in the history of sci¬ 
ence, the Victorian physician, mathemati¬ 
cian, and explorer Sir Francis Galton. Galton 
explored in Africa, made major contribu¬ 
tions to the development of statistics, and 
conducted research in psychology, most 
noticeably on intelligence. He wholeheart¬ 
edly endorsed the theory of evolution pro¬ 
posed by his cousin Charles Darwin. Galton 
believed that human intelligence was largely 
inherited. He also maintained that intelli¬ 
gence was one manifestation of a person’s 
overall constitutional fitness. Therefore, 

5 Horn & Noll, 1994. 

4 Sternberg, 2003. 
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it should be possible to learn something 
about a person’s intelligence by examining 
his or her physique, including brain size, and 
by determining the efficiency of the per¬ 
son’s nervous system by doing such things 
as recording the speed with which he or she 
reacted to a signal to strike a bag. These ideas 
are alive, in much expanded form, today. 

The next step was taken at the start of 
the twentieth century, when the Frenchman 
Alfred Binet developed the first intelligence 
tests to be used in schools. Testing has dom¬ 
inated the study of intelligence since then, 
so we need to look more closely at the idea. 

1.2. Testing 

If you want to go beyond saying “people 
are different” you have to offer some way 
of measuring those differences. There is an 
imperative to develop such measures if a 
society wants to assign different roles to 
different people, based on their personal 
characteristics. Not all societies do this, all 
the time. There was no intelligence test 
for the ruler in the hereditary kingships of 
medieval Europe. The Hindu caste system 
pre-assigned people to social roles, based 
upon their birth. It is notable, though, that 
both these societies experienced a good deal 
of conflict due to their restricting people’s 
social roles. 5 6 

In village-based societies personal knowl¬ 
edge of individuals plays a major role in 
assigning people to jobs. When American 
pioneers began to move into the northern 
plains states Sitting Bull, the paramount 
chief of the Lakota (Sioux] Indians, selected 
Crazy Horse to be war leader from among 
people whom he knew personally.' That 

5 During the Hundred Years' War between England 
and France (roughly 1350-1450) the French court was 
disrupted when Charles the Foolish inherited the 
throne. He probably suffered from a bipolar psy¬ 
chosis. In India the Sikhs were formed largely as a 
protest against the rigid social structure enforced by 
the Hindu caste system. 

6 He did well. Crazy Horse defeated General Custer 
at the Battle of the Little Big Horn. This was the 
most stunning Native American victory in the his¬ 
tory of the western expansion. 


technique does not work in today’s large 
societies, where there are many positions 
to be filled in both government and indus¬ 
tries. Leaders cannot possibly know all their 
subordinates, let alone the subordinates’ 
subordinates. Our society requires formal 
machinery for selecting candidates either 
into employment, directly, or into educa¬ 
tional systems that serve as channels to 
future employment. 

Many societies solve this problem by an 
elaborate form of recommendations. A boy 
or (historically less often] a girl who is 
thought to be talented is sent off for training 
and/or apprenticeships. While the details 
have been lost, this appears to be the way 
the ancient Egyptians selected children to 
be trained as scribes. It was also the way in 
which second, third, and fourth sons were 
recruited for the priesthood (or the army) 
in medieval Europe. The person was not 
needed at home, and somebody had connec¬ 
tions enough to start them on a career. The 
use of connections is certainly not unknown 
in modern times. But we do rely on another 
method of personnel selection: testing. 

There have been many objections to test¬ 
ing. In evaluating them it is well to keep one 
thing in mind. Society needs a mechanism 
for personnel selection. Not everyone can 
have whatever they want. Students have to 
be selected, jobs have to be filled, and when 
behavioral problems arise, mental compe¬ 
tence must be assessed. If you do not like 
testing, what is your alternative? 

1 .2.1. Testing Before Psychology 

Modern psychologists did not invent test¬ 
ing. In the days when the Chinese emperor 
claimed to rule “the Earth, the Moon, and 
three quarters of the Sun” an elaborate 
series of local, regional, and nationwide tests 
was used to select officers for the imperial 
bureaucracy. Candidates had to write tra¬ 
ditional poetry and to explain the impor¬ 
tance of fearing the will of heaven and know¬ 
ing the words of the sages. Evidently it was 
assumed that a person who could do these 
things could collect the imperial taxes or be 
an ambassador to the Mongols. 
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Some centuries later the British Empire 
emulated the Chinese Empire. Until after 
World War II career positions in both the 
Indian empire and the British upper bureau¬ 
cracy were filled largely from the ranks of 
people who had read history, classics, and 
occasionally economics at the elite universi¬ 
ties, especially Oxford and Cambridge. The 
British assumed that someone who could do 
well on oral and written examinations of the 
writings of Horace and the wars of Caesar 
would have the ability to administer India or 
to ferret out secrets while on Her Majesty's 
Service. 7 

Such techniques of personnel selection 
may sound quaint to us, but, on the whole, 
they worked. The Chinese bureaucracy held 
the empire together in a way that no suc¬ 
cession of able emperors could ever have 
done alone. Properly educated British gen¬ 
tlemen administered India reasonably well 
for two hundred years. The classics pro¬ 
gram at Cambridge University produced at 
least four remarkably effective twentieth- 
century spies. Unfortunately, they spied 
against Britain for the Soviet Union, but 
that is a motivational rather than a cogni¬ 
tive aberration. 8 They were good at what 
they did. 

Why did these exercises in testing appar¬ 
ently irrelevant knowledge do a reasonably 
good job of selecting people able to run very 
large, complex empires? Or for that matter, 
able to fool a modern counterintelligence 
agency? 

What the British and the Chinese had 
stumbled on, and what we today attempt to 
evaluate, was a collection of mental traits 
that, collectively, we call intelligence. These 
traits define individual differences in skills 
that have broad application in many settings. 
One of the most important aspects of intelli¬ 
gence is an ability to learn. You demonstrate 

7 Gardner, Kornhaber, & Wake, 1996, pp. 12-16. 

8 Anthony Blunt, George Burgess, David MacLean, 
and Kim Philby. Burgess, MacLean, and Philby fled 
to the Soviet Union when they were about to be 
exposed. Blunt stayed in England, undiscovered, 
and became a distinguished art historian. His espi¬ 
onage role, which seems to have lasted through the 
1940s and 50s, was not publicly revealed until 1979. 


this by showing that after exposure to 
knowledge you have learned something. The 
skills needed to learn the wisdom of Confu¬ 
cius or the philosophical ideas of Socrates 
are not exactly the skills you need to run 
an empire, but there is an overlap. For that 
matter, the skills needed to do well on a col¬ 
lege entrance test are not exactly the skills 
you need to acquire a bachelor's degree, but 
there is an overlap. That is why both the clas¬ 
sic and the modern testing systems work. It 
is also why they work imperfectly. 

1 . 2 . 2 . Alfred Binet Invents Modern 
Intelligence Testing 

At the start of the twentieth century the 
French Ministry of Education had a prob¬ 
lem. The idea of universal public education 
had been accepted, but the schools did not 
seem to work for some students. How did 
this problem arise? 

France, like all modern democracies, was 
(and is) committed to providing public edu¬ 
cation for all its citizens, so that all children 
have an opportunity to compete for desir¬ 
able positions in society. This goal is not easy 
to achieve. 

Modern schooling is an historically 
unusual form of education. Before 1800 
most humans were educated “on the job" - 
observing and then helping adults, and 
serving as apprentices. Universal education, 
the requirement that every child learn by 
practicing seemingly esoteric exercises in 
a setting divorced from everyday life, is 
a late nineteenth-/early twentieth-century 
idea. By 1900 it was apparent to educators 
that some children have a great deal of trou¬ 
ble learning in this manner. The French edu¬ 
cational administration needed to have a 
way of identifying such children, so that they 
could either be dropped from the system 
or channeled into an educational program 
more suited to their capabilities. 

The fact that France was a democracy 
imposed an added constraint. French edu¬ 
cators needed an objective method, in pref¬ 
erence to the subjective impressions of 
“persons in authority,” the teachers and prin¬ 
cipals. In an authoritarian regime there is no 
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need for such a method; if the authorities 
don’t like you ; you’re out. 

So from the very first, testing was embed¬ 
ded in the society that required it. 9 

In order to meet this challenge the Edu¬ 
cation Ministry hired Alfred Binet, who had 
worked for a while with Galton. Binet began 
by making two important assumptions. The 
first was that mental competence increases 
over the childhood years. The typical six- 
year-old can solve problems a four-year-old 
cannot; a four-year-old can solve problems a 
two-year-old cannot, and so on, at least from 
birth to the late teenage years. Therefore, it 
makes sense to talk about mental age (MA) - 
the level of mental competence at which a 
child is operating. 

Binet took a pragmatic approach to the 
measurement of mental age. He asked expe¬ 
rienced teachers what sorts of problems chil¬ 
dren could solve at different ages. Once he 
had a set of problems typical of the ones chil¬ 
dren could solve at age six, seven, eight, and 
so on, he could determine a person's men¬ 
tal age by finding the most difficult prob¬ 
lems that a child could solve. Mental age 
could then be compared to chronological 
age (CA), to determine whether a child has 
been performing below, at, or above the cog¬ 
nitive level that would be expected. 

Binet then made his second, more 
arguable, assumption. He assumed that a 
child’s relative standing in mental devel¬ 
opment, compared to his age group, will 
remain fairly constant as the child grows 
up. If Sammy and Tommy are both six, 
but Sammy has a mental age of eight and 
Tommy one of five, Binet assumed that four 
years later, when they are both ten, Sammy 
would have a mental age higher than ten, 

9 The contrast between the French and the Chinese 
and British imperial systems is informative. The 
Chinese and British systems were designed to select 
a sufficient number of qualified candidates for gov¬ 
ernment functions. So long as the supply of young 
officers and bureaucrats was adequate, there was 
little concern that the system might have shut out 
potential candidates. The French testing program 
was designed to staff society, without favoring some 
citizens over others. Any government has to solve its 
staffing problems. Only democracies have to justify 
the staffing system to the citizens. 


and Tommy a mental age lower than ten. 
Therefore, it follows that if you test children 
on entrance to school (age six or seven), and 
you find that some are markedly behind (i.e., 
have mental ages in the three-to-four range), 
those children are likely to be behind their 
classmates at all ages, and therefore are can¬ 
didates for removal from the normal school 
program. That is what the French education 
system wanted to know. 10 

The Education Ministry accepted Binet’s 
argument. The modern era of intelligence 
testing had been born. 

1 . 2 . 3 . The Intelligence Quotient (IQ) 

Binet did not use the term Intelligence Quo¬ 
tient (IQ) because the concept of mental 
age was sufficient for classifying children 
who were entering school. As mental test¬ 
ing expanded to the evaluation of adoles¬ 
cents and adults, however, there was a need 
for a measure of intelligence that did not 
depend upon mental age. Accordingly the 
intelligence quotient (IQ) was developed. 
There have been changes in the definition 
and use of the term since its introduction. 
The details are provided in panel 1.1. Here 
we proceed directly to the modern use of 
the term. 

The term IQ is used in two ways, which 
I will call the narrow and broad uses of the 
term. 

The narrow definition of IQ is a score 
on an intelligence test, developed accord¬ 
ing to a scoring protocol where “average” 
intelligence, that is, the median level of per¬ 
formance on an intelligence test, receives a 
score of 100, and other scores are assigned 
so that the scores are distributed normally 
about 100, with a standard deviation of 15. 
Some of the implications are that: 

1. Approximately two-thirds of all scores 
lie between 85 and 115. 

2. Five percent (1/20) of all scores are above 
125, and one percent (1/100) are above 
135. Similarly, five percent are below 75 
and one percent below 65. 

10 Binet & Simon, 1905. 
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Panel 1.1. The Intelligence 
Quotient (IQ) 


Mental age is inadequate as a means of 
comparing the intelligence of two chil¬ 
dren of different chronological ages (CA), 
Suppose a six-year-old and a ten-year-old 
both have a mental age (MA) of eight. 
Cognitively, they are likely to be very 
different individuals, for one is develop¬ 
ing rapidly and the other is developing 
slowly. The need for a measure of men¬ 
tal development that is independent of 
chronological age led to the concept of 
the Intelligence Quotient, IQ, which was 
originally defined as the ratio of mental 
age to chronological age, multiplied by 

10Q.* 


MA 

lQ ='°°' CA- 


M 


To illustrate, a ten-year-old who can 
solve problems at the level of difficulty 
expected of a twelve-year-old would have 
MA = i2, CA = 10, IQ = 120. An IQ of 100 
indicates that the child’s cognitive devel¬ 
opment is proceeding on schedule, an IQ 
above 100 indicates acceleration, and an 
IQ below 100 indicates slowed develop¬ 
ment. In the case of our hypothetical six- 
year-old and ten-year-old, both of whom 
have a mental age of eight, the first child 
would have an IQ of 133 and the second 
an IQ of 80. In modern educational terms 
the first child might be considered for 
an accelerated program, while if the sec¬ 
ond’s IQ score were accompanied by dif¬ 
ficulties with schoolwork he or she would 
be a candidate for a special education pro¬ 
gram. 

This method of calculating IQ will not 
work with adults, because intelligence 
does not increase linearly with age past 
childhood, A man of sixty whose men¬ 
tal powers are equal to those expected 
of a forty-year-old would not be con¬ 
sidered a case of retarded development! 
Therefore, the IQ ratio just described has 
been replaced by a measure based on the 


notion that IQ should reflect a person’s 
relative standing within his or her own 
age group. 

Intelligence tests are standardized by 
giving them to a relatively large sam¬ 
ple of people chosen to be representative 
of the population for whom the test is 
intended. In the case of a test intended for 
broad use, such as establishing the mental 
competency of adults, this is essentially 
the entire population of the country, so 
attempts are made to obtain a large repre¬ 
sentative sample of the United States, the 
United Kingdom (for the British version), 
Spain (for the Spanish version], and so 
forth. 1 The sample should be sufficiently 
large that a distribution of scores can be 
obtained for different age groups. 

The raw test score is based on the 
number of items correctly answered 
and/or the difficulty level of the item. 
(The scoring algorithm varies somewhat 
with the test, as discussed in Chapter 2). 
The IQ score is derived from the raw 
score in the following way. 

Let y be a raw score for a person in 
a particular age group, and let M be the 
mean and S the standard deviation for all 
scores in the reference group. The cor¬ 
responding standard score is 


Y-M 




The IQ score is derived from this by the 
conversion 


IQ = 15 • z -h 100. (1.3) 

If the raw scores were normally dis¬ 
tributed, then the resulting IQ scores will 
be normally distributed with a mean of 
100 and a standard deviation of 15. The 
graphic depiction of this distribution is 
the famous “Bell Curve.” The Bell Curve 
for IQ distributions is shown in Figure 1.1. 

Why not record scores in the standard 
score format? Statisticians, psychometri¬ 
cians, and research workers would prob¬ 
ably prefer to do this. The advantages 
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have to be weighed against two counter¬ 
vailing trends: tradition and public rela¬ 
tions. IQ scores were introduced almost 
a century ago; people are used to them. 
Standard scoring is a bit esoteric for the 
nonstatistician. For a statistician, having a 
scale with a mean of zero and a standard 
deviation of 1 is a convenience, nothing 
more. If a (nonstatistical) parent were to 
be told that their child had scored a zero 
on an intelligence test, they might inter¬ 
pret this as a claim that their pride and 
joy had the intelligence of a rock! Half the 
people who took the test would receive 
negative scores, which could lead to all 


sorts of misunderstanding. Appearances 
can be important. 

* The concept of IQ was developed by the Ger¬ 
man psychologist William Stern, not by Binet. 
t This procedure contains the implicit assump¬ 
tion that the distributions of intelligence are 
the same across populations. Thus if you con¬ 
sider a score of 100 on the Spanish form of 
the test to indicate the same thing as a score 
of 100 on the American version of the test, 
you are implicitly assuming that Americans and 
Spaniards have equivalent intelligences, on a 
population basis. Such assumptions have been 
vigorously debated. The controversies are dis¬ 
cussed in detail in Chapter 11. 
t The standard deviation is a measure of the 
amount of variation in a population. More 
details are given in Chapter 2. 


Thus IQ, in the narrow sense, is a score 
indicating a person’s relative performance 
on an intelligence test, compared to the per¬ 
formance of people in an appropriately cho¬ 
sen comparison group. This does not com¬ 
pletely clarify the matter, because there can 
be debate about what counts as an intel¬ 
ligence test. This matter is also discussed 
on page 8. I will attempt to be clear about 
how the term is being used in various con¬ 
texts. 

In the broad sense the term IQ is used 
as a synonym for intelligence, that is, as a 
shorthand term for individual differences in 
cognition. This can lead to confusion, so I 
shall attempt to use IQ only in the sense of 
a test score. The term intelligence will be used 
to refer to the broader concept of individual 
differences in mental ability. In my usage, a 
person who has high intelligence will proba¬ 
bly have a high IQ score, but the distinction 
between the two is important. 

In interpreting IQ scores it is often use¬ 
ful to think of percentiles, which indicate 
the percentage of people in the referent 
group whose scores are below a certain level. 
What that level is will be determined by the 
IQ score and by the properties of the Bell 
(normal) Curve itself. Table 1.1 gives some 
important reference scores. The properties 
of these scores follow from the assumption 


that IQ scores will follow the normal distri¬ 
bution, which is illustrated in Figure 1.1. 

As a result, in terms of the modern scor¬ 
ing, if someone says that their child has an 
IQ of, say, 120, this does not mean that the 
child’s mental age is 20% higher than his 
chronological age. It means that the child 
has a test score in the top 9% of test scores 
at the child’s age. 

Why can we assume that IQ scores are 
distributed normally? The answer is simple. 
IQ tests (and many other tests) are con¬ 
structed by choosing appropriate numbers 
of easy, intermediate, and hard items, so 
that the total scores will be normally dis¬ 
tributed in the population for which the test 
was intended. The Bell Curve is an artifact 
of the way the test is constructed! There is 
no definition of IQ independent of the tests 
themselves. This contrasts with a variable 
like height, which is defined independently 
of yard sticks or meter sticks. Height hap¬ 
pens to be distributed approximately nor¬ 
mally, within the populations of adult males 
and females. The distribution of height is a 
fact of nature. The fact that IQ test scores 
are normally distributed is an outcome of 
the test construction procedure. Neverthe¬ 
less, it is a reasonable thing to do. Why? 

IQ scores are used to describe people rel¬ 
ative to each other. They are also used to 
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Table 1.1. The distributions of standard and IQ scores in 
terms of the percentage of people above or below selected 
scores. An IQ of 65 would, if accompanied by other 
indications of mental incompetence, be cause for considering 
a person mentally disabled. If IQ is distributed normally, 
about one percent of all people have IQ scores this low. 
Average IQ is, by definition, 100. Approximately half of all 
scores lie between 90 and 110. About 16% of the scores lie 
above 115, 2% above 130, and 1% above 135. MENSA , an 
organization whose members have high IQ scores, defines 
the 4 sigma group, people with IQs over 160. (Sigma is a 
term frequently used to refer to the standard deviation.) This 
level of score would be expected three times in every 100,000 
observations. 


Standard Score (z) 

IQ Score 

% Below 

% Above 

- 2 -33 

65 

0.982 

99.018 

—2.00 

70 

2.275 

97-725 

—1.00 

8 5 

15.866 

84.134 

-.67 

90 

25.249 

74 - 75 1 

0.00 

100 

50.000 

50.000 

.67 

110 

74 - 75 1 

25.249 

1.00 

u 5 

s 4- 1 34 

15.866 

2.00 

130 

97 - 7 2 5 

2.275 

2-33 

135 

99.0l8 

O.982 

3.00 

M5 

99.865 

O.I35 

4.00 

160 

99.997 

O.OO3 


make predictions and to indicate associa¬ 
tions, as in predicting a student's likely aca¬ 
demic progress or investigating the associa¬ 
tion between intelligence and income. There 
are technical reasons for wanting to deal 
with normally distributed scores when we 
apply the statistical methods used for mak¬ 
ing predictions and analyzing associations. 

There is another, less technical reason for 
requiring that IQ scores be normally dis¬ 
tributed. Many other human qualities that 
can be measured on scales with physical 
interpretations, like height and weight, are 
distributed normally. It seemed to many of 
the early researchers that if we could mea¬ 
sure intelligence in some physical manner, 
such as measuring the efficiency of the ner¬ 
vous system, these measures would probably 
turn out to be normally distributed. There¬ 
fore, it seemed appropriate to require that 
IQ scores be normally distributed. 

In the late nineteenth and early twentieth 
century this reasoning seemed compelling, 


because the normal distribution itself was 
regarded (almost mystically) as a Law of 
Nature. Today we are a bit more skepti¬ 
cal, but there is still a good argument for 
assuming normality. If a person’s intelli¬ 
gence is due to a large number of indepen¬ 
dent causes, each of which has a small effect, 
intelligence would be distributed normally 
across the population. 

A certain amount of the confusion 
between the broad and narrow senses of IQ 
is due to the way in which cognitive tests 
are described. Some tests are explicitly mar¬ 
keted as intelligence tests. But because the 
term IQ, and sometimes even the term intel¬ 
ligence, have acquired a bad taste in cer¬ 
tain circles, many tests of cognitive skills 
are not marketed as intelligence tests, even 
though these tests are highly correlated 
with tests that are marketed as intelligence 
testsl For instance, in a widely read and 
highly controversial report, Richard Herrn- 
stein and Charles Murray used the Armed 
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Figure 1.1. The "Bell Curve" for IQ. The area under this curve 
represents 100% of the population. The area under the curve and to 
the left of a given IQ value (on the abcissa) represents the fraction 
of people in a population who have IQs lower than the indicated 
IQ. Conversely, the area to the right indicates the fraction of 
people who have this IQ or a higher one. For example, 50% of the 
area under the curve lies to the left of IQ = 100, indicating that half 
the population has an IQ of less than 100. Nine percent of the area 
under the curve lies to the right of 120, indicating that only nine 
percent of all people have IQs of 120 or higher. The Bell Curve for 
IQ scores is a special example of the normal, or Gaussian, 
distribution. At the extremes, the curve never quite touches the 
abscissa (“x axis”), but this cannot be shown on the graph. 


Services Qualifying Test (AFQT) as a mea¬ 
sure of intelligence, and treated AFQT and 
IQ scores as being virtually synonymous. 11 
The US Department of Defense never refers 
to the AFQT as an intelligence test. Similar 
confusions arise with the SAT. Many 
research projects have used SAT scores as a 
measure of intelligence, although the test’s 
publisher, the Educational Testing Service, 
does not describe it as an intelligence test. 

There is great controversy over whether 
or not IQ scores should be treated as 
real indicators of mental ability. Panel 1.2 
presents a historical debate that took place 
in the 1920s, but in many ways foreshadowed 
contemporary arguments. I shall come down 
squarely in the middle of the controversy. I 
will argue that the scores certainly do mean 

11 Hermstein & Murray, 1994. 


something, but they may not mean as much 
as some enthusiasts claim. 

1.2.4. WTnzt Binet Discovered: “Drop 
in from the Sky " Testing Works 

Let us take a closer look at what Binet 
assumed and what he found. 

Binet’s assumption that mental compe¬ 
tence increases as children grow older is 
certainly correct. Mental competence may 
decrease in old age, but that is another story, 
and was of no concern to Binet. He was also 
correct that there are marked individual dif¬ 
ferences in the rate at which mental compe¬ 
tence increases. 

His second assumption, that relative 
standings remain constant as children age, is 
true on the whole, but there are exceptions. 
As a toddler, Albert Einstein was a relatively 
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Panel 1.2. Defining Intelligence: The 
Debate between Mr. Lippmann and 
Professor Boring 

In science clarity of definition is essen¬ 
tial, for good definitions make clear what 
the important questions are. The study of 
intelligence has been plagued by a lack of 
precise definitions. The debate between 
Lippmann and Boring, early in the twen¬ 
tieth century, shows how the failure to 
define terms introduced confusions that 
continue to this day. 

Following the use of tests in World 
War I, intelligence testing became 
a growth industry. So, inevitably, it 
attracted the attention of learned com¬ 
mentators - people who, if there had 
been TV in those days, would have 
appeared as talking heads on the Sunday 
morning pundit shows. One of the most 
respected of these commentators, Wal¬ 
ter Lippmann, did not at all like the new 
technology. He was particularly incensed 
by a claim, based on analyses of the Army 
Alpha data, that the average American 
had a mental age of fourteen. In Lipp- 
mann's own words: 

The intelligence test, then, is an instru¬ 
ment for classifying a group of peo 
pie, rather than 11 a measure of intel¬ 
ligence . n People are classified within a 
group according to their success in solv¬ 
ing problems which may or may not be 
tests of intelligence. They are classified 
according to the performance of some 
Californians in the years 1910 to about 
iqi 6 with Mr. Terman’s notion of the 
problems that reveal intelligence. They 
are not classified according to their abil¬ 
ity in dealing with the problems of real 
life that call for intelligence . 

(Lippmann, 1922a) 

Lippmann argued that the test devel¬ 
opers had produced a barrage of statis¬ 
tics that had the trappings of science, 
but were not scientific. The tests them¬ 
selves were based on hunches by people 
such as Terman that this or that behavior 


indicated intelligence, rather than on 
any scientific theory of what constituted 
intelligence. Lippmann also doubted that 
a classification of people on the basis of 
test scores would map onto a classifica¬ 
tion according to their ability to “deal 
with problems of real life that call for 
intelligence.” 

Academia responded. The Harvard 
professor E. G. Boring* clarified the mat¬ 
ter by asserting: 

Intelligence is what the intelligence tests 

test. 

(Boring, 392^ 

So therel 

The exchange between Lippmann and 
Boring foreshadowed a debate that is 
active today. Should a test be developed 
inductively, by the pragmatic procedure 
of identifying people who are believed to 
vary in intelligence, seeing what behav¬ 
iors distinguish those who are intelligent 
from those who are not, and then incor¬ 
porating these behaviors into a test? Or 
should a test be based on a theory of how 
individual differences in cognitive power 
come to arise? 

The answer is not a simple one. To 
see why, consider the following analogy. 
I think that everyone would agree that 
people differ in their physical fitness. But 
what is physical fitness? You could take 
the approach of an athletic coach, and ask 
people to run, jump, throw weights, and 
so forth in order to determine physical fit¬ 
ness. Alternatively, you could take a med¬ 
ical approach. Physical fitness depends 
on muscular adequacy, reaction, and the 
ability of the heart and lungs to provide 
fuel to the muscles. So let us take mea¬ 
sures of cardio-pulmonary capacity and 
construct tests of the strength of isolated 
muscle groups and the speed of neural 
impulses. 

Binet, Terman, and their many suc¬ 
cessors took the coach’s approach. Lipp¬ 
mann seems to have wanted a more the¬ 
oretically justified approach, although he 
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did not offer one. (After all, he was a jour¬ 
nalist, not a scientist.) Lippmann did not 
deny that something called intelligence 
exists, for he spoke of dealing with prob¬ 
lems that call for intelligence. His objec¬ 
tion was to the tests offered to evaluate it. 

Boring’s response, identifying intelli¬ 
gence with a test score, strikes many 
(including me) as somewhat arrogant, for 
it confounds the concept of intelligence 
with a score on an imperfect indicator of 
intelligence. However, it does lead to a 
research agenda. 

The first item on the agenda is to ask 
whether or not the tests identify peo¬ 
ple who would otherwise be considered 
intelligent. Lippmann said they would 
not, but that was his opinion, not an 
observation of fact. 

To some extent the way in which 
the tests were developed ensured that 
they did identify “the intelligent.” Binet 
and Terman began with groups who had 
known variations in cognitive compe¬ 
tence (e.g., younger and older schoolchil¬ 
dren), and then developed tests whose 
scores reflected the variation. Lippmann 
pointed out that this procedure makes 
the definition of intelligence dependent 
upon an appropriate choice of exami¬ 
nees for the initial (. standardization ) test¬ 
ing. This was an important observation. 
Today there is a lively debate over the 
appropriateness of using intelligence tests 
that have been standardized in the post¬ 
industrial world to evaluate intelligence 
in the developing nations of Africa, Asia, 
and Latin America. 

But Boring had a point, too. If test 
scores are accepted as good indicators of 
intelligence, the tests become a power¬ 
ful tool for investigating individual dif¬ 
ferences in cognition. You can determine 
how test scores are related to variations 
in brain structure and/or process, genetic 
makeup, schooling, family support, and a 
variety of other causal factors. Test scores 
can also be used as predictors of success 


or failure in all walks of life. We can 
then consider different explanations for 
observed relationships. Science advances 
by accounting for relationships between 
observables, not by debating beliefs about 
what ought to be the case. 

Boring's view has lead to the psycho¬ 
metric approach to intelligence. It has 
resulted in a viable research agenda. But 
it contains a weakness. 

If intelligence is equated with an IQ 
score, the study of intelligence is reduced 
to an analysis of the causes and conse¬ 
quences of test scores. So if the origi¬ 
nal process of test construction missed 
something important about intelligence, 
the assertion that intelligence is what the 
tests test makes it hard to incorporate 
new perspectives. Lippmann was right to 
be concerned over the conceptual limita¬ 
tions of a fascination with test scores. 

The argument between Lippmann and 
Boring foreshadowed future develop¬ 
ments. There are many advocates of 
the psychometric approach today. It is 
not entirely devoid of theory. In fact, 
the analysis of pragmatically defined test 
scores has suggested theoretical posi¬ 
tions about intelligence that can then 
be expanded to include new measures. 
Some of these efforts will be presented in 
later chapters. 

There are also contemporary psychol¬ 
ogists who accept Lippmann’s concerns 
(although he is seldom cited!) and then 
go beyond criticizing to try to develop a 
theoretical basis for intelligence and intel¬ 
ligence testing. Such a theory will be use¬ 
ful if it leads to a coherent picture of 
how individual differences in cognitive 
power arise and influence human expe¬ 
rience from birth to death. 

In Section 1.4 I present my own views 
about what such a theory has to deal with, 
and explain the difficulties that it encoun¬ 
ters. 

* Boring, 1923. 
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late talker! Most of us can think of some¬ 
one who was not terribly impressive in grade 
school or high school, but who had a great 
college career. Things go the other way, too. 
The smartest kid in grade school does not 
always become a Phi Beta Kappa in college, 
let alone becoming a prize-winning physicist 
or a fabulously wealthy entrepreneur. 

Such examples are striking, but they are 
exceptions to a general rule. Past the age 
of about ten, indicators of relative cogni¬ 
tive competence are fairly stable. A Scot¬ 
tish study found a substantial correlation 
between intelligence test scores taken at 
age eleven and subsequent measures taken 
when the examinees were in their sixties 
and seventies. 12 In psychological jargon intel¬ 
ligence is a trait, a characteristic of the indi¬ 
vidual that is reasonably stable over time and 
that is revealed in many situations. 

Binet’s third assumption was the unno¬ 
ticed 6oo-pound gorilla in the room. Let us 
accept that there are individual differences 
in the mental competencies required to be 
successful in modern society. 15 These com¬ 
petencies, collectively, are what we mean 
when we say “intelligence.” Binet assumed, 
and then showed, that it is possible to make 
reliable measurements of a substantial num¬ 
ber of the traits that constitute intelligence, 
within the space of three or four hours. 

I suggest that the reader go back and 
weigh each clause of that sentence. Binet 
showed that some of the cognitive traits 
important to success can be evaluated within 
a single testing session, conducted outside 
of the context of everyday activities. I like a 
term originally coined by Robert Mislevy,^ 
Drop in from the Sky testing, because it cap¬ 
tures the examinee’s view of the out-of- 
context nature of the assessment. Much of 

12 Deary et al., 2000. 

13 Individual differences in mental competence prob¬ 
ably occur in every human society, although the 
relative importance of specific competencies may 
vary from society to society. The ability to locate 
one’s self in space is less important in a society that 
has street signs and global positioning systems than 
in a society of hunter-gatherers. 

14 Mislevy, formerly at ETS and now a professor at 
the University of Maryland, is a highly respected 
specialist in the field of psychometrics, the mathe¬ 
matical analysis of test scores. 


the research in the century following Binet 
can be looked upon as an attempt to refine 
and expand the measurements that can fit 
into the “Drop in from the Sky” paradigm. 

Throughout this book I shall argue that 
by accepting the conventional testing para¬ 
digm as a given, intelligence researchers have 
too often ignored traits that cannot be 
measured within the paradigm but that, 
by any reasonable definition, are part of 
intelligence. Nonetheless, a nontrivial sub¬ 
set of human cognition does fit into the 
“Drop in from the Sky” paradigm, and hence 
can be evaluated using conventional testing 
methods. 

1.2.5. Expansions ofBinet's Work: The 
Stanford-Binet and Wechsler Tests 

Binet’s tests excited a great deal of inter¬ 
est. Educationally, the need for classifying 
students was a problem for the burgeon¬ 
ing public educational systems of the early 
twentieth century. Intellectually, the tests 
provided a foothold for applying the new, 
but rapidly developing, science of psychol¬ 
ogy. Lewis Terman, a professor at Stanford 
University, translated and modified Binet’s 
tests for use in the United States. Due to 
the interruptions caused by World War I, 
Terman did not complete his work until the 
1920s. The resulting test, the Stanford-Binet 
Intelligence Test, is still used in regularly 
updated form today. 

The Binet and Stanford-Binet tests 
were intended for use with schoolchildren, 
roughly age fifteen and younger. In the late 
1930s David Wechsler, a clinical psychologist 
working at New York City’s Bellevue Hos¬ 
pital, created a similar test for adults, the 
Wechsler-Belleime test. It has subsequently 
been modified into the Wechsler Adult Intel¬ 
ligence Scale (WAIS). It and a companion 
test for children, the Wechlser Intelligence 
Scale for Children (WISC), are the most 
widely used individually administered intel¬ 
ligence tests today. The Wechsler tests are 
often referred to as the “gold standard” for 
intelligence tests. 15 As of 2010 the fourth 

15 Matarazzo, 1972. 
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revisions of the tests, WAIS-IV and WISC- 
IV, were in use. 

Both the Wechsler and the Stanford- 
Binet tests are individually administered. 
The examinee sits down with a trained 
examiner and attempts to solve a series of 
problems, divided pragmatically into prob¬ 
lems that do or do not involve language, and 
that vary in the demands that they place on 
memory. Wechsler has described this as an 
opportunity for the examinee to display his 
or her cognitive skills during a standardized 
interview with an experienced observer. 16 

The resulting IQ scores have proven to be 
highly useful, both in the educational system 
and in other settings. For instance, the WAIS 
is widely used to evaluate a person who, for 
whatever reason, is of suspected mental dis¬ 
ability. Examples of such use are the adjudi¬ 
cation of legal competence and the analysis 
of status following brain injury. Other appli¬ 
cations of these tests, and there are many, 
are extensions of these ideas. The focus is 
always on the individual. The cost of testing 
is evaluated relative to the potential benefits 
of the results in making judgments about 
an individual case. Are the decisions made 
about this person improved by knowing test 
scores, and if they are, is the value of a typ¬ 
ical decision enough to justify the costs of 
the test? 17 

The Wechsler and Stanford-Binet tests 
are not the only individually adminis¬ 
tered intelligence tests. However, they have 
played a very important role in the develop¬ 
ment of testing. Some further tests will be 
discussed in Chapter 2, where the tests are 
described in more detail. 

16 Wechsler, 1975. 

17 The law provides a dramatic example. The United 
States Constitution prohibits “cruel and unusual 
punishment” but does not define it. Several of the 
states and the current federal code include capital 
punishment for treason and for particularly vicious 
cases of murder. The Supreme Court has held that it 
would be cruel and unusual punishment to execute 
a mentally retarded person for committing a capi¬ 
tal crime, on the grounds that the person, though 
guilty, could not understand the crime or its con¬ 
sequences. As of 2009 a defendant’s attorney could 
offer an IQ test score as one piece of evidence of 
mental retardation and, hence, as a reason for not 
assessing the death penalty. 


1.2.6. The Development of Group Testing 

The next major step in intelligence testing 
was a spin-off of technology from a military 
application. When America entered World 
War I the army had to make rapid men¬ 
tal evaluations of large numbers of incom¬ 
ing soldiers. The War Department 18 spon¬ 
sored development of a test that could be 
administered to large groups of recruits. 
Psychologists responded with the Army 
Alpha Test, a written test suitable for group 
administration, and the Beta test, which 
could be given to nonliterates. 19 

The program was considered a success. 
Today the militaries in virtually all devel¬ 
oped nations routinely use cognitive tests to 
screen recruits. The tests that the U.S. mili¬ 
tary uses will be described in Chapter 2. 

The military tests are examples of per¬ 
sonnel classification tests. Cognitive tests for 
personnel classification are widely used in 
the civilian sector as well as in the military. 
The costs and benefits of testing within a 
personnel classification system are not the 
same as the costs and benefits of testing 
intended for individual counseling and/or 
placement. In a personnel classification sys¬ 
tem correct classifications have a value and 
incorrect classifications have a cost, as seen 
from the perspective of the institution set¬ 
ting the test, rather than as seen from the 
perspective of the examinee. A classifica¬ 
tion test is economical if, on the average, 
the cost of administering the test is less than 
the value of improved decision making. This 
view shifts the focus from decisions about an 
individual to the average value of a decision, 
calculated over the entire population. The 
shift greatly affects the economics of testing. 
A cheap test, which makes only a moderate 
improvement in the accuracy of the selec¬ 
tion decisions, administered to thousands of 
or even millions of people, can be a valuable 
classification instrument. 

18 Now renamed the Department of the Army and 
absorbed into the larger Department of Defense. 

19 In the United States today illiteracy is almost a signal 
that the individual is of suspect intelligence. That 
was not true in 1917-18, for universal public edu¬ 
cation was not the case. Many people of normal 
intelligence were illiterate, especially in rural areas. 
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Well over a hundred group-administered 
classification tests have been developed. 
They include the Scholastic Ability Test 
(now renamed by its acronym, the SAT] 
used in the college admissions process, and 
the General Aptitude Test Battery (GATB), 
which for years was used by the United 
States Department of Labor to provide a test 
score to guide in industrial hiring, and many 
more. 

Describing all the tests in use today liter¬ 
ally takes a volume, and the volume has to be 
updated annually. Some of the more promi¬ 
nent tests are described in detail in Chap¬ 
ter 2. The important point is that since the 
1930s intelligence testing has been widely 
used to make important decisions about 
people’s academic and vocational careers. 
Testing is also used as a guide in medical 
rehabilitation, such as evaluating the course 
of treatment following insults to the brain. 
The tests are also widely used in research on 
the description, causes, and consequences of 
being intelligent. 

1.3. Do the Tests Work? Efficacy 
and Controversy 

There has been a great deal of debate about 
how well intelligence tests work. The debate 
is not surprising, because the issue is inher¬ 
ently complicated. How accurate does a test 
have to be before we say that “it works’? 
There can be different standards for differ¬ 
ent purposes. How well do we understand 
why the tests work? People are uncomfort¬ 
able using indicators that they do not under¬ 
stand. To the scientist interested in intel¬ 
ligence, though, this question ought to be 
a challenge. What are the consequences of 
basing important decisions about education, 
employment, and personal planning on test 
scores? 

We will look at the question of test accu¬ 
racy in considerable detail in Chapter 10. 
Here I will do three things: explain the cri¬ 
terion by which tests are judged as being 
accurate or not, provide just a few interest¬ 
ing statistics (saving the rest for Chapter 10], 
and then describe the controversies. 


Tests would be perfectly accurate if, 
whenever a test score was used as a predic¬ 
tor, there was some critical score such that 
everyone who had a score equal to or higher 
than the critical score succeeded, and every¬ 
one with a lower score failed. No such intel¬ 
ligence test exists, and none ever will. Test 
scores are not perfect indicators of a person’s 
cognitive power; other things besides intel¬ 
lectual talent determine success; and success 
itself is not an either/or thing. The question 
is not whether or not test scores can be used 
to make perfect predictions of success or 
failure; the question is whether or not using 
test scores reliably improves our ability to 
predict who will succeed or fail. Tests that 
do this are said to be valid. Validity is a mat¬ 
ter of degree; the more using a test improves 
prediction, the higher the test’s validity. 

Here is an example that illustrates the 
issue. 

In the 1930s the Scottish psychologist 
Godfrey Thomson conceived the impressive 
plan of testing virtually the entire nation of 
Scotland. And he managed to do it. In 1932 
Thomson’s Moray House intelligence test 
was given to almost every eleven-year-old 
in Scotland, more than 80,000 in all. 

About seventy years later Ian Deary, a 
professor at the University of Edinburgh, 
and his colleague Lawrence Whalley traced 
2,300 of the people whom Thomson had 
tested. Figure 1.2, taken from their report, 
shows the fraction of the original respon¬ 
dents to the 1932 test who were still alive 
at different times from the 1930s until the 
start of the twenty-first century. The data is 
shown separately for men and women from 
the upper quartile (top 25%) and lower quar- 
tile (bottom 25%] of the distribution of test 
scores. Intelligent people live longer! 

An intelligence test score obtained in 
childhood was a valid statistical predictor 
of length of life. This ought to catch the 
reader’s attention. 

Now let’s use this fact to uncover some of 
the complications dealing with test scores, 
and by implication, intelligence itself, as a 
predictor. 

We first must determine whether or not 
we are dealing with an artifact due to 
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Figure 1.2. The fraction of individuals surviving to various ages for 
a cohort of approximately 2,200 Scottish children tested at age 
eleven, in 1932. From Whalley and Deary, 2001, Figure 1. By 
permission of the authors and the British Medical Journal 


some relationship that is forced between 
test scores and the outcome of interest. For 
instance, in the United States the SAT scores 
of people who enter college are higher than 
the test scores of people who do not enter 
college. That fact, alone, is uninteresting, 
because test scores are one of the determin¬ 
ers used to decide who gets to go to col¬ 
lege. However, this sort of problem did not 
arise in the Scottish study, for it is unlikely 
that a score on a test taken at age eleven 
has any direct influence on mortality. The 
statistic indicates some meaningful relation¬ 
ship between mortality and whatever the 
test measures. But what? 

I can imagine three different reasons why 
a test score might predict mortality. There 
might be a direct relationship. We know, 
today, that intelligence test scores are partly 
determined by the state of the brain. This 
statement will be documented in detail in 
Chapters 6 and 7. It could be that the test 
scores (imperfectly) revealed the state of the 


brain at age eleven, and this state carried 
forward over the years, producing an associ¬ 
ation between test scores and later mortal¬ 
ity. This explanation would have appealed 
to both Galton and Juan Huarte de San Juan. 

There could be an indirect relationship. 
Perhaps intelligence, as revealed by the tests, 
makes people less likely to do unhealth¬ 
ful things, like drinking alcohol to excess, 
or driving a motor vehicle after drinking 
to excess, or forgetting to take vaccinations 
against influenza, and so on. This explana¬ 
tion - that intelligence is related to finding 
smart solutions as you navigate your way 
through life - would have appealed to Binet. 

Finally, it might be that the relationship 
is not due to intelligence at all; it is due to 
a third variable that both causes test scores 
(although it is not intelligence as we nor¬ 
mally think of the term) and also influ¬ 
ences mortality. Parental socioeconomic sta¬ 
tus (Parental SES) is a term used to refer 
to the general “place in society” occupied 
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by a child’s family. It includes such things 
as financial status, education, and generally 
beneficial lifestyle practices, such as hav¬ 
ing children inoculated against common dis¬ 
eases. It is also an indicator of the extent to 
which a child is likely to have “family con¬ 
nections” to help establish his or her own 
lifestyle. There is a positive statistical rela¬ 
tionship between children’s test scores and 
Parental SES. It might be that childhood 
test scores predict mortality because 
Parental SES assists in the development of 
a lifestyle that promotes longevity (or the 
opposite, for low test scores and Parental 
SES). The contemporary American psychol¬ 
ogist Richard Nisbett, a professor at the 
University of Michigan, has made a strong 
argument that many of the relationships 
between test scores and social outcomes are 
actually relationships between SES and the 
outcomes. 20 

Which of these explanations do you think 
accounts for the Whalley and Deary data? 
Could two or more explanations both be 
true? 

There are probabilistic relationships bet¬ 
ween test scores and a variety of interesting 
outcomes in life. In general, the higher a per¬ 
son’s test score, the more likely that person is 
to have good things happen to them. These 
include educational achievement, on-the- 
job performance, income, marital status, 
health and longevity. 21 Most of the statisti¬ 
cal relationships are high enough to be eco¬ 
nomically valuable evidence to guide deci¬ 
sion making, both in personal life planning 
and in personnel classification. But because 
the relationships are probabilistic they have 
to be understood in statistical terms. This 
demands some expertise on the part of the 
person who tries to interpret the facts. 

Test scores are not only correlated with 
outcomes in life, they are also correlated 

20 Nisbett, 2009. 

21 There are a great many studies backing up these 
assertions. See Neisser et al. (1996) for the state¬ 
ment of the situation by a review panel of the 
American Psychological Association, Gottfredson 
(1997, 2007b) for reviews, and Herrnstein and Mur¬ 
ray (1994) for a comprehensive analysis involving 
the AFQT. Chapter 10 discusses the topic in much 
more detail. 


with a variety of other measures that are 
correlated with the outcomes. The Parental 
SES example just given is illustrative. It 
is often hard to determine why there is a 
relationship between intelligence test scores 
and outcomes such as educational achieve¬ 
ment, job performance, and health. The 
fact that there is a correlation between an 
IQ score and X, where X is almost any 
life outcome, does not prove, alone, that 
“intelligence causes X.” There will almost 
always be several plausible interpretations 
for the relationship, and they are often not 
mutually exclusive. In the intelligence-life 
expectancy example three alternative expla¬ 
nations were offered, and none of them 
excluded the others. The questions “What 
causes intelligence?” and “What does intelli¬ 
gence cause?” are not easily answered. 

Do not expect simple answers about 
intelligence in this book. I shall try to be as 
clear as I can, but some things are necessarily 
complex. 

If you encounter simple answers some¬ 
where else, my advice is to be very, very 
suspicious. 

1.3.1. The Controversies 

Because test scores are used to make major 
social decisions, it is hardly surprising that 
they have been controversial. One of the 
first of these debates occurred in the 1920s, 
when the social commentator Walter Lipp- 
mann published a series of articles criticiz¬ 
ing intelligence testing in The New Republic, 
an influential magazine in liberal intellectual 
circles. Lippmann’s articles provoked a furi¬ 
ous response by Terman and, in somewhat 
more measured terms, a critique by the Har¬ 
vard professor E. G. Boring. Boring’s reply 
included the somewhat famous definition of 
intelligence as whatever the tests test. 

The debate between Lippmann and Bor¬ 
ing is described in Panel 1.2. Their argument 
touched on concerns about intelligence test¬ 
ing that are still raised today. Other con¬ 
cerns have also been raised. Most of these 
concerns deal with complicated facts about 
what intelligence is worth, and who has 
it - topics that are discussed in detail in 
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Chapters 10 and 11. Here I will foreshadow 
the discussion of the controversies briefly, 
by introducing the objections to the tests 
and outlining the nature of the responses. 

Objection 1. The tests cannot possibly 
work. It is unreasonable to believe that 
performance on a "Drop in from the Sky” 
test, made up by people who do not 
know the examinee , could possibly reveal 
important mental traits. 

A slightly more sophisticated way of say¬ 
ing this is that the objector believes that 
human thought is so subtle that its nature 
could not possibly be captured in a single 
interview. This makes the objection seem 
much less anti-intellectual, but it amounts 
to the same thing as my original, italicized 
version. 

I do not know whether to call Objection 
1 a Know-Nothing attitude (I don't want 
to learn) or a Know-All attitude [I already 
know the answer, so your tests are not neces¬ 
sary). In any case, this objection dominated 
the debate between Lippmann and Boring 
(Panel 1.2), and it still arises today. I do not 
believe that it is productive. 

Objection 2. The tests don't work. There 
are people who have only modest test 
scores and do well , and people who have 
very high test scores and are not doing 
notably well. 

Here the objector asks that all predictions 
made on the basis of test scores be accurate. 
This is an impossible goal. Intelligence is not 
the only thing that determines success or 
failure, in academics or in life in general. 

I conjecture that the objection is rooted 
in two facts - a psychological one and a soci¬ 
ological one. 

The psychological fact is that when peo¬ 
ple have to deal with probabilistic relation¬ 
ships they try to get an intuitive feeling for 
the relationship, rather than analyzing num¬ 
bers. This leads to an overinterpretation of 
exceptional cases. Because the tests do work 
pretty well, on the average, failures stand 
out. People remember the case of the person 


with low test scores who did brilliantly, or 
the one with high test scores who did mis¬ 
erably. The fact that most people perform 
about as well as their test scores predict, 
with a small amount of variation, just does 
not stick in memory. 

The sociological fact is that many of us 
live in a cognitively stratified society. Chil¬ 
dren of parents with high, middle, or low 
SES go to schools with children of the same 
background as their own. Academic test¬ 
ing mechanisms stratify college students on 
the basis of SAT or similar test scores. In 
the workplace our coworkers tend to be 
of roughly the same intelligence as we are. 
When we deal with people of greatly varied 
intelligence we tend to do so in a stereotyped 
way. A sales clerk, for instance, encounters 
customers of highly varied intelligence, but 
does not interact with them in a way that 
would reveal their intelligence. Stratification 
occurs at home, because there is a great deal 
of residential segregation by socioeconomic 
status; therefore, neighborhood life is “strat¬ 
ified by intelligence." 

The result is that within a person’s local 
society there will not be a great deal of varia¬ 
tion in intelligence, so other factors, such as 
variations in personality, will play a role in 
determining social success. Therefore, when 
people try to evaluate intelligence by refer¬ 
ring to their personal experiences they are 
likely to undervalue the role of intelligence 
in the general society. 

The poor statistician has an uphill climb 
when trying to show that intelligence really 
is an important part of the big picture. 

Objection 3. The tests work only in the 

academic arena. 

This objection flies in the face of the evi¬ 
dence. The point will be addressed directly 
in Chapter 10, where evidence will be given 
showing that test scores predict success 
in both the academic and the industrial/ 
economic sphere. Because the argument has 
to be made statistically, it has to counter the 
results of the intellectual stratification of the 
workplace, as described in the response to 
objection 2. 
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There is also another problem. Leav¬ 
ing aside spectacular cases, like billionaires 
heading companies that pioneer new tech¬ 
nology, it is hard to establish who is suc¬ 
ceeding in the workplace on an individual 
basis. This makes the statistical analyses 
more complex than they are in the case 
of analyzing academic success, and hence 
harder to explain to the nonstatistician. 

Objection 4. The tests work only for 
certain demographic groups, notably 
middle-class whites. The tests do not pre¬ 
dict well for other groups. 

This issue will be addressed directly in 
Chapter 11. Foreshadowing, while some data 
is lacking, it is usually found that test 
scores have about the same power to predict 
achievement in all demographic groups. 

Objection 5. The tests should not be 
used because they are prejudiced against 
minority group members, who tend to get 
low scores. 

This objection raises one of the most 
incendiary topics in psychology: the possi¬ 
bility that there are individual differences in 
intelligence across racial and ethnic groups. 
The topic is discussed in detail in Chapter 11. 
Here I present a brief statement of the issue. 

In the United States, and in other indus¬ 
trially developed nations, there are differ¬ 
ences in the average intelligence test scores 
of different racial/ethnic groups. In the 
United States the order of scores, from 
highest to lowest, is Asian-derived groups, 
European-derived groups, Latin-American- 
derived groups, and then African-derived 
groups. The Asian-European gap is reliable, 
but rather small. The European-Latino and 
European-African gaps are substantial. The 
debate is not over the existence of differ¬ 
ences, it is over their meaning and their 
implications for action. These are two sepa¬ 
rate issues. 

If the tests predicted performance only 
in the white group, discrepancies in test 
scores across groups could simply be disre¬ 
garded. However, that is not the case; the 


test scores do predict performance in all 
racial/ethnic groups. Although many strong 
opinions have been expressed, there is no 
consensus about the causes of group dif¬ 
ferences in IQ scores. Because this debate 
requires a good understanding of intelli¬ 
gence itself, I have postponed the discus¬ 
sion of racial and ethnic differences until the 
next-to-last chapter. 

Should test scores be used in person¬ 
nel screening if doing so would reduce 
the proportion of minority applicants who 
are accepted for employment or education? 
There is no justification for using a test that 
would have an adverse impact on one group 
or another if that test is not a valid predictor 
of performance. If the test is a valid indi¬ 
cator - as intelligence tests usually are, to 
some degree - then the policy maker is faced 
with a trade-off. Should the best people be 
selected, regardless of group membership, 
or should some effort be made to balance 
rates of acceptance across racial and ethnic 
groups? This is a policy issue, not a scientific 
one. Scientific research can provide informa¬ 
tion about the costs and benefits of a policy, 
but the decision is up to the policy maker. 

1.4. A Framework for Thinking 
about Intelligence 

1.4.1. Manifest and Latent Variables: 
Definition and Diagramming Conventions 

Many of the arguments over intelligence 
have been less informative than they might 
be, because the arguers have not been clear 
about what their concept of intelligence is, 
or about what they think the relation is 
between test scores and intelligence, in the 
more general sense of individual differences 
in mental competence. Here I present my 
own views. They represent an expansion of 
a model of the causes and consequences 
of mental competence that I developed 
together with Jerry Carlson, a professor at 
the University of California, Riverside. 22 We 
did not present a new theory of what intelli¬ 
gence is, nor do I here. The goal is to provide 

22 Hunt & Carlson, 2007a. 
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Figure 1.3. A model of the development and 
utilization of cognitive abilities (“Intelligence”). 
Intelligence is a latent variable, indicated by 
manifest variables that include but are not 
limited to intelligence test scores. Intelligence 
influences, and is influenced by, a number of 
environmental and biological variables. 

a framework for discussing theories and facts 
about intelligence. 

Figure 1.3 presents the model I will use 
throughout this book. Before discussing its 
content a word is in order about the notation 
used in the figure, for these notational con¬ 
ventions will be followed throughout this 
book. 

Rectangles will be used to represent man¬ 
ifest variables, things that can be seen and, 
in principle, measured. Test scores, grades, 
and salaries are examples of manifest vari¬ 
ables. Ellipses will represent latent variables, 
conceptual entities that are used in theories. 
Intelligence, cultural knowledge, and socio¬ 
economic status (SES) are latent variables. 
They can be defined only by their (imper¬ 
fect) manifest indicators. Latent variables 
are never defined or measured directly; their 
values are inferred from observation of the 
appropriate manifest variables. For example, 
SES is inferred from education, wealth, and 
residence. Intelligence is inferred from test 
scores and, in many cases, other indices of 
cognition. 

We also have to distinguish among three 
separate relations between variables: causa¬ 
tion, reciprocal causation, and correlation. 
Causation will be indicated by a single¬ 
headed arrow. For instance, in Figure 1.3 


there is a single-headed arrow between 
genes and brain structure, because genetic 
makeup does determine brain structure. 
Reciprocal causation, in which one variable 
causes another and then the second feeds 
back into the first, will be indicated by a 
double-headed arrow. The double-headed 
arrow between intelligence and the social 
environment indicates that a person's intel¬ 
ligence partly determines his or her social 
environment, which in turn influences the 
further development of intelligence. Corre¬ 
lation, in which two variables tend to occur 
together, without any implication of causa¬ 
tion, will be indicated by a double-headed 
arrow with a dashed line between the arrow¬ 
heads. There is an example of this in Fig¬ 
ure 1.3, and we will encounter examples 
of correlation without causation in other 
diagrams. 

The debate between Lippmann and 
Boring can be cast in terms of manifest and 
latent variables. Lippmann thought of intel¬ 
ligence as a latent variable, but failed to grap¬ 
ple with the question of how it should be 
related to manifest variables. Boring either 
did not make the distinction, or wanted to 
assert that his manifest variable, the test 
score, was a perfect indicator of the latent 
variable. I do not think that either position 
can be maintained. 

1.4.2. The Causes of Intelligence 

Now let us turn to substance, working from 
"the beginning,” at the left side of Figure 1.3, 
toward the expression of intelligence repre¬ 
sented on the right-hand side. 

It all starts with the genes. A person’s 
genetic makeup determines his or her poten¬ 
tial for the development of brain structures 
that support all activities, including cog¬ 
nition. There are individual differences in 
genetic makeup; otherwise we would all be 
clones. These differences do have implica¬ 
tions for the development of intelligence, 
even though most people probably operate 
well below their genetic potential. 

Although the genotype is established at 
conception, parts of the genotype may not 
be expressed until certain ages are reached. 
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For example, there are a number of medi¬ 
cal conditions, such as Alzheimer’s disease, 
that have some genetic basis, but are not 
displayed until well past childhood. 

The extent to which the genetic potential 
is realized depends upon the extent to which 
the environment encourages the develop¬ 
ment of intelligence. In modern developed 
societies it appears that somewhere upwards 
of 50% of the variance in IQ scores can 
be statistically associated with genetic vari¬ 
ation. However, no one, ever, inherited a 
test score in the same sense that they inher¬ 
ited the color of their eyes. People do dif¬ 
fer in the extent to which they have inher¬ 
ited brain mechanisms that allow them to 
deal with their society in a way that pro¬ 
duces the mental capabilities required to 
solve the problems proposed on a cogni¬ 
tive test. My apologies for a complicated 
sentence, but I cannot think of any other 
way to make this very important point. 
Genetic capacities unfold throughout life. In 
fact, the association of genetics with intelli¬ 
gence test scores is higher in old age than in 
adolescence. 23 

The extent to which the genetic poten¬ 
tial is realized is determined by the physical 
and social environment. Here is a striking 
example. 

If a pregnant woman abuses alcohol her 
child may be born with fetal alcohol syn¬ 
drome, a serious form of mental retardation. 
The likelihood that this physical condition 
will occur depends on the social environ¬ 
ment. Fetal alcoholism does not incur in 
societies that enforce abstinence. It can be a 
substantial problem in societies where alco¬ 
hol is freely available as a recreational drug, 
especially if the social stresses that lead to 
alcohol abuse are present. 

Social influences also act upon the devel¬ 
oping mind. At this point, though, we have 
to shift from the concept of brain to the 
concept of mind. Ultimately everything is 
in the brain. Nevertheless, it is often useful 
to distinguish between mental capabilities 
that are determined by a person’s capacity to 

23 McGue et al., 1993. 


process information in the abstract and capa¬ 
bilities that are determined by the possession 
of information, either about the world or 
about how to solve problems in the world. 
Information and problem-solving styles are 
heavily influenced by the society within 
which a person lives. 

Arguably the most striking example is 
literacy. The ability to read appears to 
be associated with willingness to evaluate 
abstract arguments and to take multiple per¬ 
spectives, a behavior that is tested in many 
intelligence tests. 24 These can be considered 
direct influences of literacy upon intel¬ 
ligence. Literacy also has an indirect 
influence, for it opens the door to formal 
education, and education is, by definition, 
a major avenue for passing on culturally 
acquired knowledge. Literacy facilitates the 
transmission of cultural knowledge, and 
cultural knowledge allows us to behave 
more intelligently. 

Medicine offers an example. Modern 
health workers wash their hands when 
they move from patient to patient. Prior 
to the mid nineteenth century this was 
not seen as necessary. Does this make 
today’s physicians more intelligent than the 
physicians of George Washington’s day? In 
an important sense, they are. Knowledge is 
cognitive power. 

There is a two-way interaction between 
brain and mind. One of the major limita¬ 
tions on knowledge acquisition is the ability 
to concentrate mental effort on a topic, in 
the face of distractions. The ability to con¬ 
centrate depends on how well certain brain 
structures work, primarily but not exclu¬ 
sively in the forebrain. It also depends upon 
how efficiently information about the exter¬ 
nal world is coded inside the nervous system. 
The coding systems people use depend very 
much upon their experience with the topic 
at hand. 

Intelligence is a personal trait produced 
by an interaction between genetic potential 
and environmental support. Where do IQ 
scores fit into this picture? 

24 Cole, 2005; Wolf, 2007. 
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1.4.3. The Measurement of Intelligence 

The center portion of Figure 1.3 makes a 
point that is obvious when you think about 
it, but that is surprisingly often forgotten. 

A person's intelligence test score is pro¬ 
duced by two things: the social decision to 
construct an intelligence test in a particu¬ 
lar way and the examinee's ability to deal 
with the test once it has been constructed. 
Different societies might construct differ¬ 
ent tests, depending upon the mental capa¬ 
bilities that each society values. This does 
not mean that tests are narrowly culture- 
bound, for some mental qualities are seen 
as vital in all societies. For instance, all soci¬ 
eties demand that their members learn their 
native language. On the other hand, soci¬ 
eties may differ in the emphasis that they 
place on other aspects of cognition. 

Both these points were illustrated nicely 
by an anecdote told me by Manuel de Juan- 
Espinosa, a Spanish psychologist who has 
studied conceptions of intelligence held by 
the Fang, a society of mixed agricultural¬ 
ists and hunters in Equatorial Guinea. When 
you ask people in Western society to list 
the attributes of an intelligent person, you 
generally get statements about the ability 
to solve abstract problems and the ability 
to comprehend and use language. Spatial 
orientation, the ability to locate oneself in 
the physical environment, is either not men¬ 
tioned or mentioned far down on the list. 
When the topic is brought to their attention 
people do agree that it is intelligent to be 
able to find your way around the neighbor¬ 
hood. In our society, though, this is usually 
not an important skill. Our intelligence tests 
include only a few evaluations of spatial ori¬ 
entation. 

When the Fang list the qualities of an 
intelligent person they say, “Intelligent peo¬ 
ple do not get lost in the forest.” This does 
not mean that the Fang devalue the sorts 
of verbal skills that Westerners mention. In 
fact, they stress verbal skills, in much the 
same way that Westerners do. If the Fang 
were to construct intelligence tests they 
would include tests of verbal skills, as ours 
do. The Fang might look into the matter 


of spatial orientation in more detail than 
we do. 

IQ tests, and tests related to IQ tests, such 
as the SAT and AFQT, are artifacts of the 
cultures in which they arose. They test some 
aspects of intelligence but not others. How¬ 
ever, the tests are not arbitrary. 

IQ tests would not have survived, as arti¬ 
facts, unless test scores could be used as 
(imperfect) predictors of what our society 
sees as socially important behaviors, such as 
academic and social achievement. Because 
the test scores do meet this criterion, the 
tests must either evaluate mental skills that 
are used by the society or they must evalu¬ 
ate mental skills that are not used, in them¬ 
selves, but whose possession is highly corre¬ 
lated with the possession of skills that can 
be used. That is, the tests might be like an 
Army physical examination where the can¬ 
didate is required to do push-ups. Soldiers 
are not going to do push-ups in combat, 
but the ability to do push-ups is correlated 
with the ability to move heavy objects (e.g., 
carrying artillery shells to a gun position), 
which soldiers may have to do. The same 
argument may apply to the mental gymnas¬ 
tics required to perform well on intelligence 
tests. 

While all human societies are not iden¬ 
tical, there is a core set of cognitive skills 
that all societies rely upon. All societies 
demand that their members learn to speak 
the native language, be able to control atten¬ 
tion, and, by the standards of all other ani¬ 
mals in the world, remember events very 
well. Therefore, an intelligence test that is 
valid in one culture is unlikely to be entirely 
invalid in another, although its validity may 
be reduced. 

Individual differences in cognitive skills 
might, in principle, either be due to posses¬ 
sion of a great many special-purpose brain 
mechanisms or be due to possession of 
very general information-processing capa¬ 
bilities that can be applied to all men¬ 
tal challenges. Intermediate solutions are 
possible; we might have a single general 
processing capacity, augmented by special 
processors. To the extent that the evolu¬ 
tion of our species has produced a general 


22 


HUMAN INTELLIGENCE 


problem-solving brain, it does not matter 
precisely how that brain is evaluated, for a 
person's behavior in one cognitively chal¬ 
lenging situation will predict how well he or 
she deals with other situations. 

There is a surprising amount of evi¬ 
dence that evolution has taken the general 
problem-solving solution. 

1.4.4. The Uses of Intelligence 

We now come to the right-hand section of 
the diagram, dealing with the results of intel¬ 
ligence. 

People use their cognitive abilities to 
define their environments. Consider good 
health practices. Sometimes very bright 
people become alcoholics and/or crash cars, 
both activities that can lead to brain dam¬ 
age. But, on the average, people with high 
intelligence test scores do not do such things, 
and so enjoy better health. 25 Once again, we 
see how intelligence and the environment 
interact. 

Cognitive abilities are not the only abil¬ 
ities that we have. Differences in behavior 
are also produced by individual differences 
in a variety of emotional-motivational traits. 
These are lumped together loosely under 
the title “personality.” It is not clear where 
to draw the line between intelligence and 
personality. For instance, conscientiousness, 
the tendency to fulfill obligations to others 
(including one’s employers) is usually con¬ 
sidered a personality trait. However, it is 
possible to see conscientiousness as an off¬ 
shoot of intelligence. It makes sense to be 
conscientious in fulfilling obligations to peo¬ 
ple who control resources that you want. 

This is hardly a new observation. Machi- 
avelli’s famous sixteenth-century discourse 
on political behavior, The Prince, contained 
cogent arguments for displaying certain 
personality traits, including honesty and 
conscientiousness, because such behavior 
is in one’s enlightened self-interest. It is 
intelligent to be thought trustworthy and 
reliable. 

25 For a typical study recent study, see Batty et al., 

2006. 


We can make a distinction between cog¬ 
nition and personality by asking if we are 
talking about whether a person can do or 
will do a certain behavior. To the extent that 
the answer is “can do” the mental acts con¬ 
trolling the behavior are part of cognition, 
and hence individual differences in them are 
part of intelligence. To the extent that the 
answer is “will do” the mental control is part 
of motivation, and individual differences are 
part of personality. 

Any particular action has to satisfy 
both “can do” and “will do” requirements. 
Although this is hardly a profound state¬ 
ment, it is surprising how often explana¬ 
tions of behavior focus on personality to the 
exclusion of intelligence or vice versa. 

1.4.5. The Results of Intelligence 

Intelligence shows itself in two ways: by 
test scores and by socially relevant behav¬ 
iors. Test scores are easy to analyze; socially 
relevant behavior is hard to analyze. Nev¬ 
ertheless, socially relevant behaviors are far 
more important than test scores. 

There are statistical associations (cor¬ 
relations) between intelligence test scores 
and measures of socially relevant behavior 
including academic achievement, income, 
health, and occupation of prestigious posi¬ 
tions in society. 26 These associations are 
facts. Arguing about why they occur is a 
reasonable thing to do. As every student in 
elementary courses on experimental design 
is told, correlation does not imply causation. 
Rather, the fact that intelligence test scores 
and measures of socially relevant behaviors 
are correlated suggests a number of possible 
causes, all of which are worth investigation. 

Performance on an intelligence test and 
performance in socially relevant situations 
might both depend upon the same cogni¬ 
tive processes. This is what most researchers 
on intelligence believe. However, the sta¬ 
tistical associations, although not negligible, 
are not compellingly large. They generally 
range from a correlation coefficient of .2 to 

26 Gottfredson, 1997; Herrnstein & Murray, 1994. See 

the extensive discussion in Chapter 10. 
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.5, and in a few situations correlations as high 
as .8 have been reported. Just what these 
statistics mean is explained in more detail in 
Chapter 2. Temporarily, I ask readers to take 
on faith the statement that the correlations 
are high enough to indicate that test scores 
and behaviors tap some common traits, but 
low enough to indicate that the traits that 
affect the tests and the behaviors are not 
exactly the same. 

It could be that the test scores themselves 
are causing the behaviors but that the cog¬ 
nitive capabilities required by the tests are 
not. This might happen in two ways. One is 
that knowledge of a high or low test score 
could cause other people to treat the exam¬ 
inee differentially. It has been claimed that 
this is the case in education, because teach¬ 
ers may respond differently to students with 
high or low IQ scores. 27 

Test scores will influence behavior if the 
scores are used to decide who is offered 
a chance to behave in a certain way. The 
SAT, the ASVAB, and the Department of 
Defense's Officers Qualifying Test (OQT) 
are examples. Other things being equal, the 
people with the highest test scores will be 
allowed to enter a prestigious university, be 
commissioned as an officer in the armed 
services, and so forth. Subsequent behav¬ 
ior will be guided by the experience of hav¬ 
ing been in the university, been a commis¬ 
sioned officer, and so forth. On the one 
hand, it can be argued that this is appro¬ 
priate. If a test correctly identifies people 
who have the cognitive capability to, say, 
benefit intellectually from an Ivy League 
education, then the test is an appropriate 
gatekeeper. On the other hand, it could 
be argued that the benefits one gets from 
being allowed to enter a certain social stra¬ 
tum depend very little on cognitive abili¬ 
ties. Colloquially, it’s not what you learn 

27 Should teachers know their students’ IQ scores? 
There are rational arguments for and against the 
practice. If teachers know students’ IQ scores they 
might offer special instruction to students they 
think are bright, or they might give up on teach¬ 
ing low-scoring students. Looked at another way, 
the information could be used diagnostically, help¬ 
ing teachers tailor instruction to the student. 



Figure 1.4. People possess cognitive skills 
(conceptual intelligence) to varying degrees. 
Some of these skills are evaluated by the tests. 
Of the skills evaluated by the tests some are 
specific to the test situation, and others are 
relevant to social behavior in general. 

at Harvard (Yale, Stanford, Duke, Oxford, 
or Cambridge) that determines your subse¬ 
quent success in life, it is who you meet in 
college. Either of these mechanisms could 
produce a correlation between test scores 
and later success, on a between-institution 
basis, because the test scores were used to 
determine who got in, but not on a within- 
institution basis, because the test scores did 
not tap the abilities required to benefit from 
the experience. 

The two explanations are not mutually 
exclusive. My own bet is that both factors 
operate. 

Figure 1.4 summarizes the argument. 
Socially relevant behaviors are partly deter¬ 
mined by cognitive skills. Variations in 
these behaviors are, conceptually, intelli¬ 
gence. Some of the skills that define concep¬ 
tual intelligence are reflected in test scores. 
Other socially important cognitive skills that 
are part of intelligence are not reflected in 
test scores, and test scores also reflect test- 
specific skills that are not related to socially 
relevant behavior. 

Lippmann was right that IQ scores do not 
measure everything, Boring was right that 
they do measure something. 

Intelligence tests do a good job of evaluat¬ 
ing an examinee's ability to respond rapidly 
to problems of varying degrees of complex¬ 
ity. In the following chapters I discuss evi¬ 
dence showing that IQ and similar tests are 
statistically related to the cognitive abili¬ 
ties that underlie socially relevant behaviors, 
ranging literally from becoming a judge to 
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becoming a criminal. However, important 
cognitive abilities required for socially rele¬ 
vant behavior are not tapped by the tests. 
What might these be? 

Any testing method that relies on the 
conventional “Drop in from the Sky" 
paradigm cannot evaluate abilities that 
reveal themselves over a relatively long 
period of time, such as the ability to plan, to 
allocate time for extended courses of action, 
or to integrate information from multiple 
sources. It is difficult, if not impossible, to 
tap these skills in a test that seldom takes 
more than three hours and that consists of 
unrelated problems that, individually, take 
only a few minutes to solve. Overreliance 
on conventional testing has greatly limited 
modern research on intelligence. 

1.4.6. Cause and Effect in Intelligence 
Research 

The arrows going back and forth in Fig¬ 
ure 1.3 highlight how hard it is to deter¬ 
mine cause and effect when studying intelli¬ 
gence. If everything were simple we would 
place causal variables on the left, intelli¬ 
gence in the center, and the effects of hav¬ 
ing intelligence on the right. To some extent 
this can be done. Genetics and a child’s 
physical and social environment do pro¬ 
duce intelligence, and a person’s intelligence 
does produce socially relevant behavior, 
and thus alters the person’s environment. 
The problem is with the feedback. Intel¬ 
ligence influences a person’s environment, 
and feedback from the environment alters 
intelligence. 

Take the case of aging. As we grow 
older two different processes influence intel¬ 
ligence. There is a decline in brain func¬ 
tion. At the same time, as people live they 
acquire “wisdom," better and better knowl¬ 
edge about how the culture runs. There 
are huge individual differences in both of 
these processes. Some people remain cogni¬ 
tively fit until great age; others descend into 
near-senility at little past fifty. Some peo¬ 
ple acquire wisdom as they pass through the 
world; others only have experiences. In both 
these situations intelligence appears to act 


as both cause and effect. Other things being 
equal, more intelligent adults are more likely 
to maintain healthy lifestyles than less intel¬ 
ligent ones, and are more open to engaging 
with, and thus extracting wisdom from, the 
world about them. 

The interaction between intelligence and 
the environment has posed a major prob¬ 
lem for researchers. We know far less than 
we need to know about the development of 
cognitive power over the adult life span. It 
is reasonably easy to measure cognition and 
other psychological variables as long as peo¬ 
ple are in the educational system. Similarly, 
it is relatively easy to obtain access to retired 
people, through social and medical support 
institutions, ranging from gray-haired hikers 
pausing at an elder hostel to the patients in 
a medical care facility. It is much harder to 
obtain access to people in the working years, 
for the simple reason that they are busy at 
work and raising families. 

This is a serious situation, for adult intel¬ 
ligence is not inert. It rises to meet environ¬ 
mental challenges. 

1.5. Reaction Ranges and 
the Challenge Hypothesis 

Imagine a hypothetical person, Harry P., at 
age fifty. We want to estimate Harry’s intel¬ 
ligence without actually measuring it. Here 
is what we can do. 

1. Our initial estimate is the average intel¬ 
ligence of a person at age fifty. 

2. Collect facts about Harry’s physio¬ 
logical status, including genetic back¬ 
ground, medical history, present eating 
and drinking habits, and so on. Use these 
to compute a biological correction fac¬ 
tor to the initial estimate. 

3. Collect facts about Harry’s social envi¬ 
ronment, including education, marital 
status, hobbies, profession, and so forth. 
Compute a social correction factor, and 
add it in. The twice-adjusted figure is 
our estimate. 

The result will be an estimate of current 
intelligence. We would not know how that 
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Figure 1.5. The concept of reaction range. The ordinate represents 
the quality of observable performance, and the abcissa represents 
the extent to which the environment supports development of 
cognitive skills. A person's level of cognitive performance is 
determined by the combination of reaction range and quality of the 
environment. Persons A, B, and C each have a unique potential for 
cognitive performance, indicated by the three lines. Their actual 
performance will be determined by the quality of the environment. 
If the environment goes from very poor (at the far left of the figure} 
to moderate (toward the middle), there will be considerable 
improvement of performance. Going from moderate to good 
environments (towards the right) does not result in a great deal of 
improvement. 


intelligence had been acquired. Estimation 
alone does not explain the dynamics of intel¬ 
ligence. 

A person’s genetic makeup does not pro¬ 
vide that person with a certain number of 
“intelligence units/’ any more than it pro¬ 
vides a person with a certain number of 
points on an IQ test. Genetic status pro¬ 
vides a reaction range , a range of levels of 
intelligence. Environmental factors deter¬ 
mine where the person will operate within 
that range. 

Figure 1.5 displays three hypothetical 
reaction ranges. Cognitive functioning is 
shown on the ordinate (y-axis), the level 
of favorableness of the environment on the 
abscissa (x-axis). Three different reaction 
ranges are shown. Where a person actu¬ 
ally functions is determined by the point at 
which a vertical line drawn from the envi¬ 
ronment’s rating crosses the reaction range 


line. Observable cognitive performance is 
determined by the combination of genetic 
reaction range and environmental quality. 
The level of genetic inheritance cannot be 
inferred from IQ scores unless environmen¬ 
tal quality is known, nor can environmen¬ 
tal quality be inferred from IQ scores unless 
genetic inheritance is known. 

Varieties of this theme will appear 
throughout our discussion of intelligence. 
For example, any difference in the cogni¬ 
tive performance of identical (monozygotic 
or MZ) twins can be attributed to the envi¬ 
ronment because MZ twins, having identi¬ 
cal genotypes, will have identical genetically 
determined reaction ranges. 

The concept of reaction range applies to 
the environment as well. This can be seen in 
Figure 1.6, which is simply a “cleaned up” 
version of Figure 1.5, in which three ver¬ 
tical lines have been drawn upward from 
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Figure 1.6. Environmental reaction ranges. Suppose the 
environmental quality is fixed (arbitrarily) at a quality of 10 for all 
individuals. Differences in cognitive performance will occur due to 
individual differences in genetic potential. However ; if individual 
A is placed in an environment of quality 15 and C in an 
environment of quality 5, A will outperform C even though C has 
greater genetic potential. 


the environmental quality axis, each line 
representing a different quality of environ¬ 
ment. Each of the lines intersects the differ¬ 
ent genetically determined reaction ranges 
at different points, resulting in a variety of 
cognitive performances, even though the 
environment is identical for all individuals. 

Figures 1.5 and 1.6 show the genetic reac¬ 
tion ranges as negatively accelerated curves 
that rise steeply at first, and then flatten 
out as they approach an asymptotic value. 
This was an arbitrary choice, for both the 
cognitive performance and environmental 
quality axes are shown on arbitrary scales. 
I chose to display negatively accelerated 
curves because of my conjecture that, in 
fact, attempts to improve cognition will 
result in negatively accelerated curves. Why 
do I make this conjecture? 

There are a number of ways of produc¬ 
ing physical environments that greatly con¬ 
strain the development of intelligence. Pro¬ 
longed famine is one; infection of the brain 
is another. Once these disaster states are 
avoided, the incremental effects of improv¬ 
ing the physical environment are probably 
rather small. The same thing is true for the 


establishment of genetic potential. There 
are many “catastrophic” genetic conditions, 
where the presence of a single anomaly 
greatly restricts cognitive development, but 
if there is no anomaly genetic potential 
seems to depend upon the combined effects 
of a large number of genes, no one of which 
contributes a great deal. These matters are 
discussed in some detail in Chapters 8 and 
9. There are lots of ways to restrict intelli¬ 
gence, but we know of relatively few ways 
to expand it. This implies negatively accel¬ 
erated growth curves. 

Having a reaction range does not mean 
that you use it. Cognitive skills are acquired 
by investing time and energy. Willingness to 
invest depends upon one’s perception of the 
likely outcome of the investment. I propose 
the following conjecture. 

The Challenge Hypothesis: 

Intelligence is developed by engaging in 
cognitively challenging activities. Envi¬ 
ronments vary in the extent to which they 
support such challenges, and individuals 
vary in the extent to which they seek them 
out . 
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The development of intelligence depends 
upon the extent to which the individual 
wishes, is allowed to, or is required to meet 
environmental challenges. 

Consider one of the linchpins of human 
cognition - language. Every human being is 
required to learn a spoken first language, 
and all normal people do so. Only in the 
last two hundred years have societies begun 
to demand literacy, which is a much harder 
skill to acquire. It is worth the effort. A liter¬ 
ate person has acquired a cognitive skill that 
makes him or her more intelligent than an 
illiterate one with an identical, but unreal¬ 
ized, biological potential for intelligence. 28 

The example expands. Literate societies 
dominate the globe. They have developed 
formal education systems that force fur¬ 
ther intellectual development. Mechanisms 
of education such as the school, the news¬ 
paper, and the World Wide Web are differ¬ 
ent from, and far more challenging than, the 
educational systems of the nineteenth cen¬ 
tury. As a result, today’s children are more 
intelligent than children in the past. As a 
reflection of this phenomenon, IQ scores 
rose throughout the twentieth century. 

Literacy is an example of a compulsory 
challenge; individuals have to meet it or suf¬ 
fer severe consequences. In other cases there 
may be options. Learning about statistics 
and probability provides a good example. 
On a population basis, relatively few peo¬ 
ple study statistics. Students who graduate 
from, and understand, elementary statistics 
courses can solve problems that Pascal and 
Gauss could not solve. I do not think that the 
typical student in a modern statistics class 
has the biological potential for intelligence 
that Pascal and Gauss had. There is a sense 
in which they are more intelligent, for they 
are more powerful problem solvers. 

Robert Sternberg has identified three 
ways a person has of responding to cogni¬ 
tive challenges. 

1. Adapting : Changing your own cogni¬ 

tive behaviors to meet the challenge. 

28 See Wolf (2007) for a very well-argued, extensive 

expansion on this point. 


Studying is a way of adapting, though 
certainly not the only one. 

2. Shaping : Changing the environment to 
adapt it to your current capabilities. 
If you have difficulty doing arithmetic, 
buy a hand calculator. 

3. Selecting : Finding a new environment 
that does not present the challenge you 
want to avoid. If you are a college stu¬ 
dent majoring in engineering, but math 
classes are difficult for you, consider 
switching majors. 

Sternberg’s three strategies have differ¬ 
ent cognitive demands of their own. Both 
adapting and shaping require a willingness 
to engage with the environment. This is in 
itself a reliable individual trait. Philip Acker¬ 
man, a professor at the Georgia Institute of 
Technology, has conducted research show¬ 
ing that willingness to engage in intellectual 
challenge is characteristic of the people who 
hold the occupations and avocations that we 
think of as requiring intelligence. 29 Select¬ 
ing can be rational in some situations, but it 
has the danger of becoming a way to avoid 
intellectual challenge, and hence, to avoid 
the development of one’s biological poten¬ 
tial for intelligence. 

1.6. Intelligence Is Part of a System 

Defining intelligence solely in terms of test 
performance is an impoverished view. It 
focuses our attention on explaining varia¬ 
tions in test scores, which are not impor¬ 
tant in themselves, at the expense of 
studying individual variations in socially rel¬ 
evant behavior, which are important. How¬ 
ever, it would be foolish to disregard the 
considerable amount of information about 
intelligence that is incorporated in test 
scores. 

One of the major reasons for studying 
intelligence is to understand how individual 
differences in cognitive competence, intelli¬ 
gence in the broad sense, are related to indi¬ 
vidual differences in the display of socially 

29 Ackerman, 1996; Ackerman & Beier, 2005. 
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relevant behaviors. In practice ; there are 
two problems. Variations in socially relevant 
performance are determined by noncogni- 
tive as well as cognitive factors. Success (or 
failure) is based on both “can do" and “will 
do.” In addition, the display of socially rel¬ 
evant behaviors depends upon the opportu¬ 
nity to display them, which may be quite 
beyond a person's control, no matter what 
his or her personal characteristics are. Here 
are two historical examples. 

In the early twentieth century Joseph P. 
Kennedy, an American financier, amassed 
a considerable fortune. Subsequently three 
of his sons - John, Robert, and Edward - 
became United States Senators. In i960 
John became President of the United States. 
One can argue, probably correctly, that 
Joseph Kennedy provided his sons the 
genetic capability to become very intelli¬ 
gent. But his fortune certainly helped their 
political careers, if only to relieve them of 
the need to earn a living outside of public 
service. 

Now let us look at an example of restric¬ 
tion. The Declaration of Independence of 
the United States, written in 1776, contains 
the statement, “All men are created equal.” 
At that time women were disenfranchised, 
and the slavery of Africans was condoned. 
Over two hundred years later Condoleeza 
Rice, an African American woman, who had 
served in many high-level positions, includ¬ 
ing Secretary of State, observed that when 
the Declaration was written, “They weren’t 
talking about me." Dr. Rice’s impressive 
achievements, which depended very largely 
on her intelligence, would have been impos¬ 
sible in 1776. 

At the start of the twenty-first century the 
Darfur region of the Sudan was wracked by 
drought, famine, and vicious ethnic warfare. 
Children born in Darfur in the year 2003, no 
matter what their genetic potential, did not 
have good life prospects. 

The point of these examples is that both 
the causes and effects of intelligence are 
embedded in a matrix of other variables. 
This makes good research on intelligence 
hard to do. The task is difficult, but it is 
not impossible. While an ideal study may 


be impossible to implement, a great deal 
can be learned from less-than-ideal studies. 
Progress can be made by investigating the 
issues that can be studied, and hopefully by 
holding down excessive interpretations and 
conclusions where we do not have the right 
evidence. 

It is fairly easy to determine the corre¬ 
lation between test scores and other mea¬ 
sures of interest, such as grade point average 
(GPA) and performance at work. Such stud¬ 
ies provide important data. However, they 
bias our knowledge toward finding out the 
role of intelligence in certain institutions, 
such as the schools and the military, that, 
while certainly important, are not the whole 
of society. In addition, the study of bivari¬ 
ate correlations, in isolation, fails to stress 
an important fact. Intelligence is just one of 
the variables in the system defined by human 
society. What does this mean? 

A system is a set of interdependent vari¬ 
ables, in which each variable influences the 
others. In closed systems the interdependence 
is complete. The value of each variable is 
completely determined by the other vari¬ 
ables in the system. In open systems some 
variables are influenced by conditions out¬ 
side the system. The real-world systems 
we study are always open. Therefore, it 
is important to distinguish between system 
variables, which exert measured reciprocal 
influences on each other, and external vari¬ 
ables , which influence the system variables 
but are not (to any great extent) influenced 
by the system variables }° 

To illustrate, consider a hypothetical 
study of the roles of genetics, family 
influence, and intelligence during primary 
school, middle school, and high school. The 
following relations hold: 

1. A child’s intelligence on the first day 
of school has been determined by 
genetic inheritance and family environ¬ 
ment prior to entering school. 

2. Intelligence on entering middle school is 
determined by intelligence on entering 

30 In economics the terms endogenous and exogenous 
variables are used. 
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Figure 1.7. A systems' view of the relationship between 
intelligence, family environment, and the quality of schooling. 
Intelligence and family environment are system variables. Genetic 
heritage and quality of school are external variables. Unknown 
sources that influence the system variables are indicated by a 
question mark. 


primary school, the quality of the pri¬ 
mary school, and family environment 
during primary school, but not by the 
environment prior to entry into primary 
school. 

3. A child’ s intelligence on entering high 
school is determined by intelligence on 
entering middle school, family environ¬ 
ment during middle school, and the 
quality of the middle school. 

4. At any time family environment is 
determined by prior family environ¬ 
ment and by the child’s intelligence. 
(People, including children, influence 
their environment.) 

Figure 1.7 diagrams the system. Intelli¬ 
gence and family environment are system 
variables, because they influence each other. 
Genetic potential and quality of schooling 
are external variables, because they exert 
influences on other measures but are not 
influenced by them. If we can obtain mea¬ 
surements of all these variables, we can use 
modern statistical methods to evaluate the 
relative influences of each variable upon the 
others. 

Then we come to the external, unmea¬ 
sured variables. Figure 1.7 makes no pro¬ 
vision for extrafamilial influences on the 


family environment, such as financial emer¬ 
gencies. Nor is there any provision for 
extrafamilial and extra-educational influ¬ 
ences on intelligence, such as physical 
injury. Therefore, we have to allow for the 
“unknown unknowns,” the influence of 
unmeasured external variables. These are 
indicated by the “?” symbols in the figure. 
While we cannot identify these variables, 
modern statistical techniques do allow us 
to estimate the size of their influence com¬ 
pared to the influence of the measured sys¬ 
tem variables on each other. Hopefully the 
influence of the “unknown unknowns” will 
be small; if it is large, analyses within the 
system will account for only a small part of 
what we need to know. 

We can learn a great deal by compar¬ 
ing systems models to each other. The 
intelligence-education system displayed in 
Figure 1.7 treats genetic influences as a 
one-shot effect - genetics influences intel¬ 
ligence prior to entering school - but has 
no direct influence subsequently. In fact, 
though, some genetic effects unfold over 
time. For example, individual differences in 
the rate at which connections are developed 
in the forebrain during adolescence may 
result in differences in the ability to control 
impulsive behavior, which may influence 
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how much a student learns in middle school, 
and so forth. These influences could be mod¬ 
eled by extending arrows from the genetic 
inheritance box to the boxes represent¬ 
ing measurements of intelligence at each 
time period, not just at the point of school 
entrance. The extension is needed only if the 
added arrows produce a system that (reli¬ 
ably) explains more of the variation in intel¬ 
ligence than did the simpler system. 

System analyses of this sort allow 
researchers to move far beyond arguments 
over the meaning of a correlation between 
IQ scores and just one other variable, in iso¬ 
lation from the system in which both occur. 

Still, problems remain. Systems incorpo¬ 
rating human intelligence are so compli¬ 
cated that no one study can ever include 
adequate measures of all relevant variables. 
The size of the influence of unknown vari¬ 
ables can be evaluated, but what those vari¬ 
ables are and how they exert their influence 
will remain unknown. Therefore, we cannot 
explain all the causes and ramifications of 
intelligence, all at once. We can identify and 
analyze reasonably closed subsystems, deal¬ 
ing with a particular aspect of intelligence. 
No one such study will tell us all about intel¬ 
ligence, but, taken together, they will tell us 
quite a lot. 

1.7. Summary and Prospectus 

Individual differences in cognitive capaci¬ 
ties - intelligence, for short - are an impor¬ 
tant part of human variation. Intelligence 


is partially tapped by intelligence tests (IQ) 
and by other tests of cognitive achievement, 
but other important cognitive abilities lie 
outside of the tested realm. While intelli¬ 
gence does not completely determine suc¬ 
cess in life (or lack of the same), it certainly 
contributes to it. In order to get a com¬ 
plete picture we have to consider personality 
and motivational factors, and the extent to 
which the social and physical environments 
encourage some behaviors and discourage 
others. 

Intelligence has both multiple causes and 
multiple consequences. In order to study 
intelligence it is necessary to isolate rela¬ 
tively closed systems of variables. This poses 
a challenge, because our ability to mea¬ 
sure some variables, and hence to study sys¬ 
tems involving them, is much greater than 
our ability to measure other variables. Test 
scores and measures of genetic variation are 
much easier to obtain than measures of suc¬ 
cess in society, or measures of variation in 
the physical and social environment. There¬ 
fore, we have to be vigilant against the error 
of studying what is easy to analyze, at the 
expense of missing amorphous but impor¬ 
tant effects. This tension will be reflected 
throughout the discussions in this book. 

Some typical tests will be described in 
Chapter 2. Subsequent chapters deal with 
the description, causes, and consequences of 
intelligence. The book closes with a discus¬ 
sion of how intelligence is distributed across 
our society and, finally, some speculations 
about the development of research on intel¬ 
ligence in the future. 


CHAPTER 2 


The Tests 


I agree with you that there is a natural 
aristocracy among men.. .. the natural 
aristocracy I consider as the most precious 
gift of nature for the instruction, the trusts, 
and the government of society... 

Thomas Jefferson, letter to John 
Adams, 1813 1 

2.1. Introduction 

Thomas Jefferson and John Adams, the men 
who wrote in the Declaration of Indepen¬ 
dence that “all men are created equal,” 
believed in a natural aristocracy! They 
weren’t hypocrites. Jefferson and Adams 
believed that all men had equal rights, not 
that they had equal talents. Adams’s life¬ 
long support of his alma mater, Harvard 
University, and Jefferson’s investment in the 
University of Virginia showed how strongly 
both men supported the development of 
the natural aristocracy. However, neither 
Jefferson nor Adams said how the natural 

1 Quoted in Lemann, 1999. 


aristocrats were to be identified. Today we 
partially rely on tests to do this. 

We all have an intuitive feeling about 
what a cognitive test is, for you are unlikely 
to grow up in post-industrial society without 
taking one. This chapter expands on intu¬ 
itive notions by describing some of the typi¬ 
cal tests, chosen to represent different types, 
and by commenting on the aspects of intel¬ 
ligence that they do, or do not, evaluate. 

Most textbooks on testing stress the dis¬ 
tinction between individual testing and test¬ 
ing people in groups. An equally important 
distinction is between tests composed of bat¬ 
teries of subtests and single-format tests. 

Test batteries contain sub tests that eval¬ 
uate different aspects of cognition. For 
instance, many test batteries contain, as sub¬ 
tests, vocabulary tests and tests of simple 
arithmetic. Overall scores are constructed 
by combining subtest scores. The way the 
widely used Wechsler test batteries are 
scored is illustrative. The Wechsler batter¬ 
ies contain subtests involving verbal and 
nonverbal material. Subset scores are com¬ 
bined to obtain verbal IQ (VIQ) and perfor¬ 
mance IQ (PIQ) scores, and then verbal and 
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performance scores are combined into a full- 
scale IQ (FSIQ) score. 

The SAT, a college admission test that 
will be familiar to many readers, is a battery 
of subtests that evaluate different aspects of 
verbal and mathematical reasoning. We will 
look at this test in more detail in section 2.3.1 
of this chapter. 

Single-format tests are tests in which all 
questions are of the same type. Thus a 
single-format test looks very much like one 
of the subtests of a test battery, used alone. 
For instance, vocabulary tests are sometimes 
used alone, to get a rough indication of a per¬ 
son's general cognitive power. Obviously a 
single format test does not evaluate as many 
cognitive talents as a test battery does. How¬ 
ever, if scores on a single format test are 
highly correlated with scores on the test bat¬ 
tery as a whole (as vocabulary scores and 
VIQ scores are for the Wechsler tests), an 
estimate of a person’s intelligence can be 
made, with some loss of accuracy and great 
reduction in cost, by using a single-format 
test that takes a few minutes instead of a 
test battery lasting several hours. 

There is no board that certifies a test as 
an “intelligence test.” Indeed, only a few 
of the many cognitive tests on the market 
are specifically labeled tests of intelligence. 
Why? The answer to this question requires a 
look at different tests, and a brief excursion 
into what might be called “political seman¬ 
tics.” The case of the SAT provides a good 
example. 

The SAT evaluates cognition. However, 
the Educational Testing Service (ETS), the 
company that constructs the SAT, never 
calls it an intelligence test. ETS says that the 
test evaluates the extent to which a person 
who has an American high school educa¬ 
tion is prepared to be a college undergrad¬ 
uate. This sounds very much like Binet's 
goal for the very first intelligence test, which 
was designed to assess children’s readiness 
for the French public education system. 2 
So why isn’t the SAT an intelligence test? 
In fact, some researchers treat the SAT as 
virtually synonymous with a measure of 

2 Binet & Simon, 1905. 


intelligence, no matter what ETS says. 3 The 
same thing is true of several other cognitive 
tests. Knowledgeable people call them intel¬ 
ligence tests, but their publishers do not. 

What we have here is a problem of defini¬ 
tions, tempered by some political controver¬ 
sies. Historically, and contemporaneously, 
certain tests have been called intelligence 
tests. These include Terman’s translation of 
Binet’s tests, the Stanford-Binet test, and 
David Wechsler's Wechsler scales, to be 
described later. Both are battery-type tests 
intended to summarize the results of an eval¬ 
uation of a wide range of cognitive skills, 
and to be appropriate for many different 
populations. 4 

These individually administered, battery- 
type tests did not fill two important niches. 
They were too expensive for mass personnel 
screening programs, such as military recruit¬ 
ment. Therefore, group-administered tests, 
such as the Army Alpha and the SAT, were 
developed. There was also a need for spe¬ 
cialized tests, to be used when the examiner 
was interested in only a limited range of cog¬ 
nitive abilities or when the test was intended 
for use in a particular population. 

Some comparisons involving the SAT 
illustrate all these issues. The SAT is a group 
test, while the Wechsler intelligence tests 
are individually administered. It costs much 
less to evaluate a person using the SAT than 
it does using the Wechsler. The Wechsler 
scales cover a greater range of cognitive 
skills than the SAT does. For instance, the 
Wechsler scales include sub tests to evalu¬ 
ate visual-spatial reasoning. This is a reason¬ 
able thing to do, for visual-spatial reasoning 
is certainly part of human cognition. The 
SAT does not contain visual-spatial reason¬ 
ing tests, on the grounds that this ability is 
not required throughout the college curricu¬ 
lum, while verbal and logical reasoning abil¬ 
ities are required everywhere. 5 

3 E.g., Jackson & Rushton, 2006. 

4 Matarazzo, 1972; Wechsler, 1975. 

5 Perhaps it should. A case can be made for consid¬ 
ering visual-spatial reasoning as an ability required 
in majors such as art, architecture, and engineer¬ 
ing. See Humphreys & Lubinski, 1996, for a more 
extended discussion. 
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Now consider the SAT compared to 
the Armed Services Vocational Aptitude 
Battery (ASVAB). Both the SAT and the 
ASVAB contain subtests evaluating lan¬ 
guage skills. The questions on the SAT are 
harder than comparable questions on the 
ASVAB, on the grounds that a higher level 
of language skill is required to succeed as a 
college student than to succeed as a military 
enlisted person. Both tests evaluate linguis¬ 
tic skills, but in different ranges. 

Here is a more extreme illustration of 
tests intended to evaluate different ranges of 
intelligence. The Mini Mental Status Exam¬ 
ination is a test widely used in medical prac¬ 
tice to determine if a person is seriously cog¬ 
nitively impaired. It contains questions like 
“What day is it?” and “Where are you?” At 
the other end of the scale, the Miller Analo¬ 
gies test, intended to screen entrants to grad¬ 
uate school, contains some very hard analog¬ 
ical reasoning questions. 

Given this variety of uses, it makes sense 
to describe a test by the use for which it is 
intended, instead of using the omnibus term 
“intelligence test.” 

A second reason for avoiding the words 
“intelligence test” is dictated more by pub¬ 
lic relations than by a desire to provide an 
accurate description. The very word intel¬ 
ligence has a negative connotation to some 
people. Some social commentators associate 
the term with elitism and even racism. The 
reasons for this are discussed in Chapters 8 
and 11. As a result some test producers deny 
that they are evaluating intelligence, resort¬ 
ing to terms like “ability” and “aptitude," 
both to avoid controversy and to increase 
marketability. 

While the proliferation of test names is 
understandable, it has resulted in a good deal 
of inconsistency. In 2006 the authors of an 
article in the journal Intelligence extracted 
a score from the SAT that they referred 
to as an index of general intelligence. 6 
Two issues later in the same journal, other 
authors referred to the SAT as a test of 
scholastic achievement, and tried to pre¬ 
dict SAT scores using scores on a nonverbal, 

6 Jackson & Rushton, 2006. 


academic-content-free test called the Raven 
Advanced Progressive Matrices (RAPM). 
The RAPM did pretty well as a predictor 
(correlation of .39), but a vocabulary test did 
substantially better (correlation of .69}. 7 

This example shows, in microcosm, what 
the situation is. On the one hand, virtu¬ 
ally all tests of cognitive abilities correlate 
positively with each other. This shows that 
there is a tendency for cognitive skills to vary 
together, across people. If you have a high 
level of one cognitive skill, you probably do 
not have a low level on another. On the 
other hand, the associations between cog¬ 
nitive skills are far less than perfect, so there 
is a case for specialized tests in appropriate 
situations. 

With these general remarks aside, let us 
look at some of the tests, as they could be 
described by data available in the 2007-09 
period. New tests and revisions are pub¬ 
lished frequently, but they all are designed 
on the same basic principles. 

2.2. A Description of Individually 
Administered Test Batteries 

Individual testing is generally appropriate if 
there is concern about the mental capabili¬ 
ties of the examinee. For instance, in educa¬ 
tion a child might be examined for special 
placement in a remedial (“special”) educa¬ 
tion program or, less frequently, for admis¬ 
sion to a program for gifted children. Tests 
used for this purpose include the Stanford- 
Binet, a much-updated version of Terman’s 
translation of Binet’s tests, and the Wechsler 
Intelligence Scale for Children (WISC). 

Adults are tested individually as part of 
a clinical assessment for a variety of behav¬ 
ioral problems. The judicial system some¬ 
times uses intelligence testing to determine 
whether or not the examinee is sufficiently 
intelligent to be held criminally responsi¬ 
ble for his or her (illegal) actions. Individ¬ 
ual testing is also used in order to assess an 
adult’s mental status following injury to the 
brain. 

7 Rohde & Thomson, 2007. 
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Intelligence tests are also widely used in 
research on individual differences in cog¬ 
nition. Researchers tend to prefer the less 
expensive group-administered tests, simply 
to save money. In some situations, however, 
individual testing is appropriate. As an alter¬ 
native, it is often possible to administer some 
of the subtests of an individual test on a 
group basis, for a special purpose, such as 
the evaluation of vocabulary. 

2.2.1. The Wechsler Tests: The Adult 
Intelligence Scale (WA 1 S-IV) and the 
Wechsler Intelligence Scale for Children 
(WISC-IV) 

The Wechsler tests for adults and children 
(the WAIS and the WISC) are by far the 
best-known individual tests. They are regu¬ 
larly revised, as are other intelligence tests. 
Early versions of the WAIS tests provided 
three scores: a verbal IQ score (VIQ) cal¬ 
culated from scores on subtests involving 
language, a Performance IQ (PIQ) score 
based on scores from sub tests that did not 
involve language, and a Full Scale IQ (FSIQ) 
based on a combination of the VIQ and PIQ 
scores. A memory scale could also be 
calculated. These scales, which are based 
on a pragmatic division of cognitive skills 
into verbal and nonverbal skills, are often 
reported in all but the most recent literature. 

The current (WAIS-IV) test has a some¬ 
what different structure, which reflects 
recent theoretical developments, and in par¬ 
ticular the importance of individual differ¬ 
ences in the speed of cognitive process¬ 
ing and in the ability to keep information 
in immediate memory while working on a 
problem. The sub tests are grouped into tests 
involving verbal comprehension, perceptual 
(visual) reasoning, immediate (“working”) 
memory, and speed of processing. An index 
is calculated for each of the groups. The Full 
Scale IQ score is calculated by combining 
the indexes. Within each of the groups there 
are core subtests, which are used to calcu¬ 
late the appropriate index, and supplemen¬ 
tal tests that can be utilized if the exam¬ 
iner wishes to probe the examinee’s abilities 
within a particular area. 


Table 2.1 lists the core tests used within 
each group. It also includes brief descrip¬ 
tions of the cognitive skills required to com¬ 
plete each test. 

Like the Stanford-Binet, the Wechsler 
tests represent a pragmatic approach to 
intelligence. They incorporate a widely 
accepted distinction between verbal and 
nonverbal reasoning. Otherwise the tests 
and subtests have evolved, and theories 
of intelligence have been induced from 
research on them, rather than psychologists 
having used a theory to generate the tests. 

2.2.2. Two Individual Tests Motivated by 
a Psychological Theory 

The pragmatic approach to theory exempli¬ 
fied by the early Wechsler tests contrasts 
with a more theory-based approach taken 
in the development of two other widely 
used tests: the Kaufmann intelligence tests 
(one for adults and one for children) and 
the Woodcock-Johnson test, which is used 
largely for adult self-assessment as an aid in 
career planning. These tests are based on a 
theoretical distinction between crystallized 
and fluid intelligence, first articulated by 
Raymond Cattell and developed further by 
his colleague John Horn. 8 

Cattell and Horn distinguished between 
solving problems by applying previously 
learned knowledge and/or problem-solving 
methods (crystallized intelligence, Gc) and 
solving problems by applying general rea¬ 
soning methods to figure out a solution 
to an unfamiliar problem (fluid intelli¬ 
gence, G/). This echoes Juan Huarte de 
San Juan’s sixteenth-century distinction 
between problem solving based on mem¬ 
ory or imagination, 9 and also resembles 
Spearman's distinction between inductive 
and deductive reasoning. Cattell and Horn 

8 Cattell, 1971,1987; Horn, 1985; Horn & Noll, 1994. 

9 Cattell and Horn's ideas were developed indepen¬ 
dently of Huarte’s work. Huarte's book on intelli¬ 
gence was first translated into English in 1594, but it 
appears to have had little influence on contempo¬ 
rary English-speaking psychologists. His ideas were 
reintroduced to us by Spanish psychologists in the 
1980s. 
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Table 2.1. The subscales of the Wechsler Adult Intelligence Scale-TV (supplemental 
scales are indicated) 

Name of Test Content Cognitive Skill Evaluated 


Tests used to compute the verbal comprehension scale 
Vocabulary Word definitions required 

Similarities Given names of objects, explain how 

they are alike 

Information Questions based on general cultural 

knowledge 

Comprehension Oral questions requiring solutions to 

(supplemental) everyday problems 

Tests used to compute the working memory scale 
Digit Span Repeat a series of digits either forward or 

backward 


Arithmetic 


Simple arithmetic problems are to be 
solved “in the head.” 


Letter-number 

sequencing 

(supplemental) 


Examinee hears a mixed sequence of 
letters and numbers, and repeats them 
back separately in their normal sequence 


Tests used to compute the perceptual reasoning scale 
Block design Assemble a set of colored cubes to form a 

specified design 


Matrix reasoning 

Visual puzzles 

Picture completion 
(supplemental) 


Recognize a pattern of changes in visual 
figures 

Choose three of six pieces to make a 
specified design 

An incomplete picture of a common 
object is shown. The task is say what is 
missing 


Figure weights A balance scale is shown, with some 

(supplemental) objects in it. The task is to choose other 

objects to balance the scale. Object 
weights are to be inferred from examples 
of balanced scales 


Tests used to compute the processing speed index 

Symbol search Indicate whether or not designated target 

symbols appear in a group of symbols 


Vocabulary 

Abstract reasoning. Detection of 
similarities 

Knowledge of culture 


Verbal comprehension, knowledge of 
culture 


Short-term storage of information 


Manipulation of information in 
immediate memory 

Manipulation of information in 
immediate memory 


Nonverbal reasoning. Visualizing 
object movements 

Pattern recognition and application 


Visualizing movement of objects. 
Visual immediate memory 

Nonverbal reasoning. General 
knowledge 


Nonverbal reasoning. The examinee is 
forced to deal with 
an unusual task 


Speed of visual processing. Short-term 
memory 


Coding 


Cancellation 

(supplemental) 


The examinee is shown an arbitrary 
pairing of marks and numbers. The 
examinee is then shown a set of marks 
and must list the associated numbers 

The examiner describes values of 
attributes of objects, e.g., shape and 
color. The examinee must indicate which 
shapes in a set have the required 
combination of attribute values 


Paired associates learning. Speed of 
visual processing 


Speed of visual processing. Also, the 
ability to disregard distracters, such as 
a shape that has one of the required 
attribute values (e.g., a red circle when 
the target shape is a red triangle) 
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Table 2.2. The subtests of the Kaufman Adult Intelligence Test 


Name of Test Description of Activity 0 


Crystallized (Gc) tests 
Definitions 

A sentence is given with a missing word. Some letters of the missing word 
are given. The task is to find the missing word. Example: The jockey was 
riding a__ 0 _S_ 

Auditory comprehension 

The examinee listens to a brief recording, and then answers questions that 
require either recall of information in the recording or drawing an 
inference from the message. 

Double meanings 

The examinee is given two sets of word clues. The task is to find a word 
that has two meanings, associated with each of them. Example: Clues: 
“Sport & Champion,” “Music & Singing.” Answer: Record. 

Famous people 

(used for extended testing) 

The examinee is shown a photograph of a famous person and a brief 
statement about that person. The task is to name the person. 

Fluid (Gf) tests 

Rebus 

The examinee is shown a “sentence” made up of pictures that stand for 
words. The task is to “read” the sentence. Example: Show cartoon pictures 
of an eye, a tomato soup can, and a fly. The sentence is “I can fly.” 

Mystery codes 

The examinee is shown a set of pictures, each with a “code meaning,” e.g., 
a large triangle representing “woman” and a black dot representing “new.” 
The examinee is then shown a triangle with a black dot inside. The task is 
to determine what the meaning of the new symbol is (“girl”). 

Logical steps 

The examinee hears or reads a set of premises, and then must deduce the 
answer to a question. Example: “At the Round Table, Gawain is to 
Arthur’s left and Launcelot is between Gawain and Arthur. Who is to 
Launcelot’s left?” 

Memory for block designs 
(used for extended testing) 

The examinee is shown a pattern composed of colored blocks. The 
examinee must then recreate the pattern from memory, using blocks 
and a form board. 


a The examples are not of actual test items, in order to maintain the confidentiality of the test. 


deserve credit for carrying this idea consid¬ 
erably further than their predecessors did. 

Table 2.2 presents the sub tests of the 
Kauffman Adult Intelligence Test (KAIT), 
grouped into sub tests used to generate the 
Gc and Gf scores. As in the case of the 
WAIS, a “full-scale” score can also be gener¬ 
ated. The Woodcock-Johnson test also gen¬ 
erates separate scales for Gc and Gf, along 
with a full-scale IQ score. 

2.2.3. A Test Motivated by a Theory 
of How the Brain Works 

The Gc/Gf model and the distinction 
between verbal and nonverbal reasoning are 


models of how the mind works, for there is 
no direct connection to brain mechanisms. 
(In Chapter 7 we look at how such con¬ 
nections can be made.) The next test to 
be described, the Cognitive Assessment Sys¬ 
tem (CAS), was motivated by a theory of 
how the brain works. The theory and related 
tests were developed by J. P. Das, a pro¬ 
fessor at the University of Alberta, and his 
colleagues. 10 

Shortly after World War II, Das, a native 
of India, had the opportunity to work in 
the then Soviet Union, with the noted 
neuropsychologist A. R. Luria. Luria had 

10 Das, Naglieri, & Kirby, 1994. 
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Figure 2.1. The four major lobes of the cerebral 
cortex. Copyright Matthew Holt, used by 
permission of the Washington University of 
St. Louis Internet Stroke Center. 


examined a very large number of brain 
injury cases. He was struck by the fact 
that damage to different areas of the brain 
produced quite different behavioral deficits. 
Luria concluded that the four major areas of 
the cerebral cortex, the frontal, temporal, 
parietal, and occipital lobes (see Figure 2.1), 
are specialized for different types of infor¬ 
mation processing. 11 

It had already been determined that the 
occipital lobe is largely concerned with the 
mechanics of vision. Luria believed that 
the temporal lobe is concerned with pro¬ 
cessing information in series (e.g., compre¬ 
hending a sentence as it is spoken), that the 
parietal lobe is concerned with parallel pro¬ 
cessing, as when one notices the symmetry 
of a figure, and that the frontal lobe is con¬ 
cerned with planning actions and with the 
control of attention. 

Luria was mainly concerned with the 
results of substantial injuries to the brain. 
Das extended Luria’s ideas to the anal¬ 
ysis of normal behavior by healthy chil¬ 
dren. He reasoned that if different areas 
of the brain carry out planning, attention 
control, and serial and simultaneous func¬ 
tions, then individual differences in behav¬ 
ior within the normal range should reflect 
individual differences in the development 
of each of the four brain areas. Das coined 
the acronym PASS, for Planning, Attention, 


Serial, and Simultaneous (PASS) model, and 
constructed a number of tests to evaluate 
each function. Das’s colleague Jack Naglieri 
subsequently developed a comprehensive 
test battery, the cognitive assessment sys¬ 
tem (CAS), incorporating subtests for the 
four dimensions of the PASS theory. 12 

Table 2.3 shows the subtests used in the 
CAS. The score on each subtest is based on 
both the speed and accuracy with which a 
child answers questions. An effort was made 
to construct test items that were as nearly as 
possible void of cultural content. This, and 
the emphasis on the speed with which sim¬ 
ple tasks are done, is consistent with Das’s 
and Naglieri's desire to evaluate the effec¬ 
tiveness of brain processes rather than men¬ 
tal competencies tied to cultural knowledge. 

Although Das was inspired by Luria’s 
neuropsychological work, it is not clear that 
the CAS scores map onto discrete brain 
functions, for our understanding of the brain 
is greater than it was in Luria’s day. The 
PASS model is stated in terms of cognitive 
behaviors: planning, attending, and process¬ 
ing information in either a parallel or serial 
manner. These are characteristics of the 
mind as an information-processing device, 
not characteristics of the brain as a physical 
device. The mapping to brain locations is 
not quite what Luria’s theory predicts. For 
instance, tasks that involve planning draw 
heavily upon both frontal lobe and parietal 
lobe mechanisms. 15 Also, the Das-Naglieri 
assignment of tasks to the four PASS func¬ 
tions seems (to me) to be sometimes insight¬ 
ful and sometimes arbitrary. As an example 
of a good assignment of task to function, the 
planning connections task is a variant of a 
“trail making” task, which is widely used as 
an indication of damage to the frontal cor¬ 
tex. However, it is less clear that one should 
regard the matching numbers and planned 
codes tasks as examples of planning. 

In the last analysis, though, the proof of 
the pudding is in the eating. There are two 
questions that we have to ask about any 
proposed battery of tests. First, do the test 


11 Luria, 1962/1980. 


12 Naglieri, 2005. 

13 Jung & Haier, 2007. 
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Table 2.3. The subtests of the Cognitive Assessment System 


Tests of planning 
Matching numbers 


Planned codes 


Planned connections 


Tests of attention 
Expressive attention 


Receptive attention 


The examinee is shown rows of eight numbers. Two of the numbers 
are identical. The task is to underline the identical numbers. Several 
pages of rows are shown. The measure is the time needed to 
complete the task. 

The examinee is shown a set of letters and a code form for each ; e.g, 
A = OX, B = XX. The examinee then examines a number of rows of 
letters and replaces them with their code form. 

The examinee is shown numbers arranged haphazardly on a page. 

The task is to draw a line (“trail”) starting with the lowest number 
and proceeding in order to the highest. In a harder problem, numbers 
and letters are intermixed, e.g., 1-A-2-B 

This is a variety of the Stroop task, in which a person is shown words 
in different-colored ink and must call out the color of the ink. The 
words are color names, as in BLUE printed in red ink. 

The examinee is presented with pairs of letters, e.g., AB, Aa, AA. The 
task is to identify which ones have the same name. 


Simultaneous processing tasks 

Matrix task This is a progressive matrix task, similar to the matrix task on the 

WAIS-R. It evaluates nonverbal reasoning. 


Verbal-spatial relations 


Figure memory 


The examinee is shown six simple pictures and asked a question. The 
examinee indicates which picture is the best answer to the question. 

The examinee is shown a geometric pattern, and then must find it 
embedded in a larger pattern. 


Successive processing tasks 

Word series The examiner speaks an arbitrary string of words. The examinee 

repeats the words in order. 

Sentence repetition The examiner speaks a syntactically correct sentence, which is 

composed of color words and has no obvious semantic content, e.g., 
THE BROWN GREENS THE YELLOW. The examinee repeats the 
sentence. 


Sentence questions 


The examiner speaks a sentence as in sentence repetition, then asks a 
question about it, e.g, WHO GREENS THE YELLOW? 


scores predict real-world cognitive behav¬ 
ior? Second, if the test is based on a the¬ 
ory, do the patterns of scores on the sub¬ 
tests behave in a manner predicted by the 
theory? In the case of the CAS, are sub¬ 
tests within the Planning, Attention, Serial, 
and Successive groups more highly corre¬ 
lated with other subtests within the same 
group than they are with subtests in other 
groups? We postpone the answers to these 


questions, for the CAS and the other battery 
tests, until our discussion of psychometric 
theory, in Chapter 4. 

2.2.4. ^ Commentary on IndiinduaUy 
Administered Tests 

Individual testing offers two advantages over 
group testing; contingent item presentation 
and supplementary reporting. 
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Contingent item presentation simply 
means that the next question that the exam¬ 
iner asks can be chosen on the basis of the 
answers that the examinee has already given. 
This is efficient and sensible, because the 
examiner avoids trying to find out some¬ 
thing about the examinee that the examiner 
already knows. If you are evaluating a per¬ 
son’s ability to do arithmetic and you have 
already seen the person multiply 24 by 72 “in 
the head,” there is not much point in asking 
if he or she can add 32 and 18! In an examina¬ 
tion of a person's knowledge of US history, if 
someone has already told you that Franklin 
Pierce was president of the US after Millard 
Fillmore, there is not much point in prob¬ 
ing to find out if they know who followed 
Franklin Roosevelt. In addition to its effi¬ 
ciency, contingent item presentation makes 
the examination more like an interview than 
an examination, for the question-response- 
next question sequence follows the rules 
of normal conversation. Wechsler himself 
referred to his scales as providing a struc¬ 
tured interview for the display of cognitive 
skills. 14 

Contingent item presentation can also 
be obtained by computer-controlled test¬ 
ing, a procedure that is becoming increas¬ 
ingly common. Computer-controlled testing 
assumes that the examinee has basic com¬ 
puter skills and is motivated to cooperate. 
Both assumptions are probably correct for 
healthy adults and older children, in indus¬ 
trial and post-industrial societies. They are 
suspect for young children, people who are 
infirm (e.g., the extreme elderly), and out¬ 
side the post-industrial world. 

Examiners typically write reports describ¬ 
ing how the examinee reacted during an 
individual testing session. These comments 
can be very helpful. For example, if intelli¬ 
gence is being examined to determine recov¬ 
ery from a brain injury, the examinee’s 
behavior during testing may be as important 
as the test score. However, there are two 
qualifications. One is that the observations 
and interpretations made during testing are 
only as good as the observer, which raises 

14 Wechsler, 1975. 


the issue of variations in skill across examin¬ 
ers, as well as examinees. The other is that 
both the cost and the difficulty of scoring 
the examiner’s subjective impressions make 
it difficult to incorporate such observations 
into research programs. 

The costs of individual intelligence exam¬ 
inations are in the $200-350 range. 15 Some¬ 
times such costs are clearly justified. Sup¬ 
pose the purpose of the examination is to 
decide whether or not a student should be 
placed in a special (remedial) education pro¬ 
gram. The cost of supporting a student in 
an American special education program is 
about $17,000, more than twice the cost of 
supporting a student in a regular program. 16 
Keeping a student in a regular classroom 
when he or she needs special education can 
be a disaster, both for the student and for the 
teachers who must deal with a person who 
cannot keep pace with his or her classmates. 
In such a case individual testing is worth it, 
from the viewpoint both of the institution 
and of the person being examined. 

2.3. Group-Administered Test Batteries 

Individual examinations are not economi¬ 
cally feasible in many personnel selection 
systems. Consider some numbers. In 2005, 
US high schools graduated approximately 
three million people. Slightly under half 
of them took the SAT, the most widely 
used college entrance examination. 17 The 
US military screens over 100,000 potential 
recruits a year. A commercial testing com¬ 
pany that specializes in screening applicants 
for hourly-wage positions uses computer¬ 
ized testing to screen an average of 40,000 
people each workday. In such situations 
individual examinations are not an option. 
Group tests are needed. 


15 A fee of $250 was quoted on the website of the 
State University of New York, Stony Brook, Center 
for Psychological Services, January 2010. 

16 This figure is based on the National Education Asso¬ 
ciation's website, www.nea.org/specialed/index 
.html, 14 February, 2007. 

17 National Council of Educational Statistics and 
College Board figures for 2005. 
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Table 2.4. The Otis-Lennon Test of School Abilities; selected subtests 

Scab or Subtest Description 


Verbal tests 

Comprehension 

Antonyms 


Synonyms 

Sentence completion 

Verbal reasoning tests 
Classification 

Analogies 


Arithmetic reasoning with 
verbal problems 

Nonverbal tests 

Figural reasoning 
Figural analogies 


Matrix patterns 


Given a base word, select from a list of target words the word 
most nearly opposite in meaning. 

Same as antonyms, except that the word to be selected should 
have the same meaning as the base word. 

Select word needed to complete an incomplete sentence. 


Given a set of words, identify those that are in the same class. 

Given two words as a base, and a target word, identify the 
relationship between the two base words and use it to select a 
word that has the same relation to the target word. Example: 
White is to black as up is to (sideways, below, down, behind). 

Word problems requiring simple computations. 


An analogy test similar to verbal analogies, except that the 
relations are between parts of geometric figures. 

A matrix test using geometric figures (see the discussion of 
progressive matrix tests in section 1.2). 


Quantitative reasoning 

Number series Given a series of numbers, where each number bears some 

relation to the proceeding ones, identify the next number in the 
sequence. 

Number matrix Matrix problems similar to those used for figures, but 

constructed using relations between numbers (see text below). 


Note: The Otis-Lennon Test formats vary with grade level, the examples here have been chosen to 
illustrate the nature of the test. 


2.3.1. The Otis-Lennon Test 

Most group-administered, battery-type 
intelligence tests carry over the pragmatic 
distinction between verbal and nonverbal 
skills that characterizes the Wechsler and 
Stanford-Binet tests. Tests of quantitative 
skills are often added. In spite of the 
similarity between these batteries and the 
individual tests, test developers seldom 
use the term intelligence. Instead the tests 
are described as tests of “reasoning” or 
“scholastic ability.” 


A good example is the Otis-Lennon Test 
of School Ability. This is a descendent of 
Arthur Otis's Otis Group Intelligence Scab , 
first published in 1918, with the latest revi¬ 
sion in 2009. Otis, who had studied with Ter- 
man, realized that most of the items on tests 
like the Stanford-Binet could be written in 
a form suitable for group administration. 18 
Accordingly he designed such tests for use 
in the schools. 


18 Robb, Bernardoni & Johnson, 1972. 
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Table 2.5. The scales and subtests of the 
Cognitive Abilities Test-3. (When the test 
is administered in classrooms, three class 
sessions are required, as indicated in the 
table) 


Session 1: 

Verbal Battery 

Test 1: Verbal Classification 


Test 2: Sentence Completion 
Test 3: Verbal Analogies 

Session 2: 

Quantitative Battery 

Test 4: Quantitative Relations 
Test 5: Number Series 

Test 6: Equation Building 

Session 3: 

Nonverbal Battery 

Test 7: Figure Classification 
Test 8: Figure Analogies 

Test 9: Figure Analysis 


The Otis-Lennon test is actually a test¬ 
ing program. Below the fourth grade chil¬ 
dren are given individual tests. Group- 
administered, age-appropriate group tests 
are provided for grades five through twelve. 
Table 2.4 shows some of the subtests used. 
The test battery is divided into verbal and 
nonverbal sections. Within the verbal sec¬ 
tion a further distinction is made between 
verbal comprehension and verbal reason¬ 
ing. The nonverbal section is divided into 
tests intended to evaluate “figural reasoning” 
(relationships between visual patterns) and 
quantitative reasoning. The test as a whole 
takes about three hours, but it is typically 
taken in several sessions. 

2.3.2. The Cognitive Abilities Test 

A similar approach to testing is seen in 
the Cognitive Abilities Test (CAT- 3 , for 
the third revision), developed by David 
Lohman, Robert Thorndike, and Elizabeth 
Hagen, and marketed through Riverside 
Press. The publishers claim that this test 
is the most widely used intelligence test in 
the United Kingdom. It has figured promi¬ 
nently in important studies of the rela¬ 
tion between intelligence and educational 
accomplishment. 

Table 2.5 shows the sections and sub¬ 
tests of the CAT- 3 . There is a substantial 


commonality between the Otis-Lennon and 
CAT- 3 . This is hardly surprising, as the 
tests were developed for the same purpose, 
for the same audience, using a pragmatic 
approach of building content on the content 
of previous tests, rather than conforming to 
a theory of intelligence. 

2.3.3. Th e SAT 

We now look at one of the best-known, 
most-discussed tests used in education, the 
SAT, which is widely used as part of the col¬ 
lege admission process in the United States. 

The SAT is a distant descendent of a 
college admissions test developed in the 
1920s by Carl Brigham, a psychologist who 
had worked on construction of the Army 
Alpha Test. Brigham observed that the 
Army Alpha was easy for candidates who 
had some college education. He concluded 
that a test similar to the Army Alpha could 
be developed as a college entrance test, but 
that it would have to be made more difficult. 

Brigham produced a test that was used on 
a trial basis by several universities during the 
1920s and 1930s. In 1933 Harvard's president, 
James Bryant Conant, became interested. 
The tests that Harvard used at the time 
emphasized the content of courses available 
in a relatively small group of expensive East¬ 
ern preparatory schools. Conant knew that 
this gave an advantage to children from fam¬ 
ilies who could afford to send their chil¬ 
dren to these exclusive schools. Accordingly, 
Conant asked one of his faculty members, 
Henry Chauncey, to look into developing 
tests that would identify gifted students who 
had not attended private schools. 

Chauncey himself was a graduate of a 
private school, although he was not excep¬ 
tionally rich. Nevertheless, in athletic par¬ 
lance, he took the ball and ran with it. He 
was impressed by Brigham’s test, but could 
not adopt it due to the disruptions caused 
by the Depression and World War II. After 
the war ended a US presidential commis¬ 
sion concluded that there was a need to 
open up college admissions procedures to a 
wider range of applicants. The commission's 
conclusion mirrored the concerns Conant 
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Table 2.6. The subtests of the SAT-I 


Test Section 

Item Type 

Comment 

Verbal comprehension 


All questions are multiple choice. 

Reading comprehension 

Read short passage and answer 
multiple-choice questions 

The questions deal with the 
meaning of words in context, 
information in the passage, and 
information implied by the passage 
but not explicitly stated. 

Sentence completion 

A sentence is presented with 
some words omitted. 

The examinee selects words to 
complete the sentence. 

Mathematics 


All questions are multiple-choice or 
require a numerical answer. 

Numerical operations 

Calculation 

Calculators are permitted. 

Algebraic functions 

Algebra problems 

Covered to the level of a 
second-year high school course. 

Geometry 

Geometric problems 

Questions cover Euclidean 
geometry up to three-dimensional 
solids 

Statistics and probability 

Questions in elementary 
statistics and probability theory 

Questions include interpreting 
graphic displays of data. 


had expressed, and Chauncey was prepared 
to respond. 

Chauncey established the Educational 
Testing Service (ETS], a nonprofit corpora¬ 
tion, in 1948. The ETS took over administra¬ 
tion and development of the SAT, and con¬ 
tinues to do so sixty years later. The SAT has 
become a watchword for students intend¬ 
ing to enter college, and ETS has become a 
major developer of both yearly updates of 
the SAT and numerous other educational 
and professional tests. 19 

The present SAT is taken in two parts. 
The first part, which the ETS refers to as the 
SAT-I, deals with skills that entering college 
students are expected to have, but that are 
not tied to specific optional parts of the cur¬ 
riculum. The second part of the SAT con¬ 
tains subject-matter specific tests. We will 
generally be concerned only with the first 
part, which will be referred to simply as the 
SAT unless there is a reason to distinguish 
between parts I and II. 

19 Lemann, 1999. 


The SAT is divided into three sections. 
One, introduced in 2005, is a writing test 
in which examinees are given twenty-five 
minutes to write an essay on a specified 
topic. The inclusion of a writing exami¬ 
nation is a good example of the demands 
placed on a screening test, as opposed to 
a test designed to evaluate general mental 
competence. Essay writing is a not a uni¬ 
versally required skill in our society, but it 
is a skill that is used in college. The writ¬ 
ing examination was included in response 
to complaints by college and university fac¬ 
ulty members that some students who had 
received acceptable scores on the pre-2005 
SAT turned out to have poor writing skills. 

As yet there is insufficient data to show 
how well the writing test score relates either 
to the other tests or to the later performance 
of college students. 

The other two parts of the SAT 
are the “critical reading” (formerly verbal 
comprehension] and mathematics sections. 
Table 2.6 describes the sorts of questions 
asked. Scores on the sections can range from 
200 to 800. Mean scores on the individual 




THE TESTS 


43 


Table 2.7. The Subtests of the Armed Services Vocational Aptitude Battery (ASVAB) 


Name of Subtest 

Commentary 

General science 

Questions about science roughly at the level of courses taught in 
US middle schools. 

Arithmetic reasoning 

Numerical calculations. 

Word knowledge 

A vocabulary test. 

Paragraph comprehension 

Read a paragraph and answer questions about it. 

Mathematics knowledge 

Apply simple mathematical formulae. 

Electronics information 

This and the next test are designed to identify enlistees who would 
be good candidates for training in appropriate technical specialties. 

Auto and shop information 

See above. 

Mechanical comprehension 

Tests the understanding of common mechanical problems. 

Object assembly 

Evaluates skill in seeing how objects should be assembled from 
diagrams. 


sections are in the 550 range (varying some¬ 
what from year to year), with a standard 
deviation of approximately 100. The overall 
score is determined by adding the two sec¬ 
tion scores, so 1600 represents a perfect score 
on the SAT. The writing test is scored sepa¬ 
rately. 

The questions on the SAT are gener¬ 
ally harder than similar questions on tests 
intended for the general population, such 
as the WAIS and comparable group tests, 
because the SAT is intended for the top two- 
thirds of high school graduates, rather than 
for the population at large. 

2.3.4. The Armed Services Vocational 
Aptitude Battery 

The Armed Services Vocational Aptitude 
Battery (ASVAB), administered and devel¬ 
oped by the personnel branch of the US 
Department of Defense, is designed to 
evaluate potential enlistees and to identify 
candidates for training in the military’s many 
occupational specialties, ranging from cooks 
to computer technicians. Like the SAT, the 
ASVAB is updated regularly. Table 2.7 lists 
the subtests of the ASVAB as of 2008. The 
tests may be administered either in paper- 
and-pencil form or, as is now usual in recruit¬ 
ing, via an interactive computer system. 


The Armed Services Qualification Test 
(AFQT) score is a weighted combination 
of the Arithmetic Reasoning, Mathematical 
Knowledge, Paragraph Comprehension, and 
Word Knowledge scores of the ASVAB. 
This makes the AFQT a combination of 
assessments of mathematical and written 
verbal skills, as is the SAT total score. There 
is a correlation of .82 between overall scores 
on the SAT and the AFQT. 20 The SAT, 
being intended for a more cognitively pow¬ 
erful audience, is considerably harder than 
the AFQT. 

The special-topic subtests of the 
ASVAB - General Science, Electronics and 
Auto and Shop Knowledge, Mechanical 
Comprehension, and Object Assembly - 
are used as indicators to see if a given recruit 
qualifies for specialized training courses 
(e.g., electronics repair). 

Each service establishes a minimum 
AFQT score that must be achieved for 
enlistment. The Navy and Air Force, both 
of which have a large number of technical 
billets to fill, generally set higher minimums 
than the Army and the Marines. Recruit¬ 
ment standards in all the services vary from 
time to time, due to the needs of the services 
for new recruits, compared to the talent lev¬ 
els in the applicant pools. 

20 Frey and Detterman, 2004. 
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In addition to its use in military recruit¬ 
ing, the ASVAB has been used by the 
Department of Labor in research on the US 
workforce. 

2.4. Single-Format Tests 

While tests like the SAT, CAT-3, and 
ASVAB are economical from the viewpoint 
of the institutions that use them, they still 
require several hours of the examinee’s time. 
Industrial personnel screening and psycho¬ 
logical research programs have a need for 
tests that make fewer demands on the peo¬ 
ple who take them, even though the result¬ 
ing examination may be less detailed. 

2.4.1. Baddeley’s Three-Minute Reasoning 
Test as an Illustration of a Speeded Test 

One strategy for designing a quick test of 
cognitive skill is to shift from measuring a 
person’s accuracy in answering reasonably 
complex questions, to measuring the speed 
with which that person can answer simple 
questions. The argument for doing so is that 
those people who can answer hard prob¬ 
lems, given time, are also likely to be able 
to answer easy problems, quickly. 

The British psychologist Alan Baddeley 
developed a test of verbal intelligence based 
on a prototypical linguistic act, determin¬ 
ing whether a sentence correctly describes a 
visual situation. Questions on this test take 
the form 


A B The A is before the B True_False_ 

B A The A is not after the B True_False_ 


Individually, these questions are easy; no 
one who reads English should ever get one 
wrong. The sentences vary in linguistic com¬ 
plexity, although that complexity is well 
short of the complexity that might tax a pro¬ 
ficient reader. While only two pictures are 
possible, A B or B A, there are eight possi¬ 
ble questions, composed by varying whether 
the question asks about the position of A 
relative to B, or B relative to A, whether 


the relation specified is before or after, 
and whether the descriptive statement is 
expressed positively, as in the first exam¬ 
ple just given, or negatively, as in the second 
example. When taking Baddeley’s test it is 
not hard to answer the question, but it may 
take a little effort to figure out what the 
question is. 

The score on Baddeley’s test is the num¬ 
ber of questions a person can answer in just 
three minutes. Simple as it is, this score 
has a correlation of .6 with scores on ver¬ 
bal intelligence tests that take over an hour 
to administer. 21 

What Baddeley showed is that the ability 
to do certain simple verbal manipulations, 
rapidly, is correlated with the ability to do 
such complex things as comprehending a 
paragraph. Therefore, the quick test can be 
used to identify good verbal comprehenders 
without asking them to do much compre¬ 
hension. This is the sort of finding that might 
have infuriated Walter Lippmann, because 
the test of comprehension only minimally 
involves comprehension. The same finding 
would have delighted Prof. Boring, because 
the test worked. 

2.4.2. The Wonder lie Personnel Test 

Baddeley’s test was developed for research 
purposes, not for the commercial market. 
The principle behind Baddeley’s test, eval¬ 
uating intelligence by seeing how quickly 
people answer simple questions, has been 
incorporated into a widely used industrial 
test, the Wonderlic Personnel Test (WPT). 22 
The WPT can be thought of as a hyper- 
compressed battery in which examinees are 
asked to answer simple verbal questions, do 
arithmetic, and solve simple logical prob¬ 
lems. The different types of questions are 
mixed up, so examinees have to switch 
rapidly from solving one type of problem to 
solving another. The WPT consists of fifty 
questions similar to those shown in Table 
2.8. The score is the number of questions 
that can be answered correctly within twelve 

21 Baddeley, 1968. 

22 Wonderlic, 1992. 




Table 2.8. Sample questions illustrating the Wonderlic Personnel Test, revised edition (WPT-R, 2007 revision) 

Question 1 Which of the following is the earliest date? 

A) Jan. 16, 1898 B) Feb. 21, 1889 C) Feb. 2, 1898 D] Jan. 7, 1898 E) Jan. 30, 1889 

Question 2 LOW is to HIGH as EASY is to_?__ 

J) SUCCESSFUL K) PURE 1 ] TALL M) INTERESTING N) DIFFICULT 

Question 3 A featured product from an Internet retailer generated 27, 80, 99, 113 and 213 orders over a 5-hour period. Which graph 
below best represents this trend? 


/ 

BCD 

Question 4 What is the next number in the series 29 41 53 65 77 __?_ 

J) 75 K) 88 L] 89 M) 98 

Question 5 One word below is underlined. What is the OPPOSITE of the word? 

She gave a complex answer to the question and we all agreed with her. 

A) long B] better C) simple D) wrong 






E 


N) 99 


E) kind 


Note: The items were provided courtesy of Wonderlic, Inc. The underlined word in example question 5 appeared in blue print in the 
original. Underlining has been added here to indicate the content of the question. 
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minutes. Research conducted well after the 
Wonderlic Test was introduced, in the 1970s, 
has shown that the ability to "change set” by 
switching between tasks is an important part 
of intelligence. 

WPT scores are highly correlated with 
the full-scale IQ score of the much longer 
WAIS. This makes the WPT an attractive 
test for personnel screening, especially in 
situations where there is a concern that 
the applicant meet a minimum standard 
of cognitive ability. In one high-profile 
application, the National Football League 
has used the WPT to screen players trying 
out for a career in professional football. 
Having a high WPT score will not get you a 
lucrative American football career. Having 
a low WPT score may make the scouts and 
coaches think twice, for there is a cognitive 
component to the life of a professional 
player. 

2.4.3. Raven's Progressive Matrices Tests: 
A Non-speeded, Single-Format Test 

We next examine tests that use the Pro¬ 
gressive Matrix item format, which is a for¬ 
mat designed to evaluate inductive reason¬ 
ing. These tests are widely used in research, 
and have also been used in personnel selec¬ 
tion, particularly in Europe. 23 Progressive 
matrix tests represent something of a con¬ 
trast between an emphasis on speed and 
accuracy. The item format can be used to 
present problems that vary widely in diffi¬ 
culty, so part of the score depends on how 
hard a problem the examinee can solve. The 
tests are usually given with a time limit, so 
the score also depends upon how quickly the 
examinee can find a solution. 

Progressive matrix tests were devel¬ 
oped by John C. Raven in 1938. Raven 
had studied with Charles Spearman, who 
believed that the defining characteristic 
of intelligence was the ability to detect 
and manipulate complex patterns appear¬ 
ing in observations. 24 He referred to this 

23 J. Raven & J. Raven, 2008. 

24 Spearman, 1923. Spearman’s ideas are further dis¬ 
cussed in Chapter 3. 


ability as eduction; most modern researchers 
would probably say "inductive reasoning." 
Raven constructed three tests to evaluate 
inductive reasoning: the Colored Progres¬ 
sive Matrices (RCPM) test, for children, the 
Standard Progressive Matrices (RSPM) test, 
for general use, and the Advanced Progres¬ 
sive Matrices (RAPM) test for evaluating 
people of above average intelligence, such 
as college students. The progressive matrix 
format has since been adopted for subtests in 
a number of battery-type tests, including the 
WAIS-IV. 

Figure 2.2 shows two progressive matrix 
questions. The examinee is shown a 3 x 3 
matrix, in which each entry is a geometric 
figure. The entries differ in some systematic 
way along both the rows and the columns. 
The lower right-hand entry (position (3,3) 
in matrix notation) is blank. The task is to 
select one of the alternative answers at the 
bottom in order to complete the matrix. In 
order to do this the examinee has to solve 
one problem defined by variation across 
rows, then keep the solution “in the head" 
while working on another problem defined 
by variation across columns. As Figure 2.2 
illustrates, problems can vary considerably 
in difficulty. 

The Raven tests take about forty-five 
minutes to complete, making them a cheap 
alternative compared to the SAT or the 
WAIS. A more recent form administered by 
an interactive computer system takes even 
less time. 

Raven’s own matrix tests always use fig- 
ural patterns, as in the example. Progressive 
matrix tests can also be constructed using 
numerical and verbal material. What num¬ 
ber should be used to fill in this matrix? 

1 2 3 

2 4 6 

4 8 

It has been claimed that progressive 

matrix tests are the best measures of gen¬ 
eral intelligence that we have. 25 The claim 
is based on statistical evidence indicating 

25 Jensen,1998. 
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Figure 2.2. Two progressive matrix items. The 
task is to complete the 3x3 matrices by 
selecting the appropriate pattern from the eight 
patterns below the line. The items have been 
made up by the present author, to illustrate the 
technique. 

that scores on these tests are associated with 
whatever common trait underlies perfor¬ 
mance on all subtests of a test battery, such 
as the subtests of the SAT or the WAIS. This 
argument will be described in Chapter 4. 

A number of researchers have used pro¬ 
gressive matrix tests in cross-cultural stud¬ 
ies, and have sometimes made claims about 
the relative intelligence of different groups 
based upon the results. The argument for 
doing so is that the ability to find patterns 
in observations is a cognitive ability that 
is required in all societies. This is a very 
strong claim. It assumes both that the pro¬ 
gressive matrix item format is appropriate 
and that the “Drop in from the Sky” testing 
paradigm is an appropriate means of evalua¬ 
tion outside of industrial and post-industrial 
societies. Such claims raise complex issues, 
for validity is not an “either/or” issue. A 
test may be valid to different degrees in 
different societies, and societies themselves 
do not fall into precisely defined categories, 


such as "developed,” "industrial,” and "post¬ 
industrial” societies. This matter will be pur¬ 
sued further in Chapter 11. 

We now turn to some general issues that 
are raised in the construction and use of the 
tests. 


2.5. Themes in Testing 

There is a great deal of commonality in the 
cognitive skills the various tests evaluate. 
Here are some of the recurrent themes. 


2.5.1. Language Use 

Virtually all battery-type tests evaluate lan¬ 
guage skills. There is a good case for doing 
so. Although a person cannot participate 
fully in human society without language 
skills, there are substantial individual differ¬ 
ences in the degree to which people pos¬ 
sess these skills. Everyone learns to produce 
and comprehend a passable version of his or 
her native language. Very few people reach 
the levels of comprehension and expression 
illustrated by Shakespeare and Cervantes. 

In intelligence testing an important dis¬ 
tinction is made between language familiar¬ 
ity and language comprehension. Language 
familiarity is usually assessed by a vocabu¬ 
lary test. Language comprehension is tested 
by asking examinees to explain the mean¬ 
ing of sentences or paragraphs. In theory 
one might argue for separate evaluations of 
comprehension of the written and spoken 
language. The case for doing so rests on 
a distinction articulated by the evolution¬ 
ary psychologist David Geary, who main¬ 
tains that we have genetically determined 
“primary” mental capacities, common to all 
societies, and socially acquired “secondary” 
capacities that are required in only some of 
them. 26 Use of the spoken language is clearly 
a primary capacity. Normal children learn to 
speak without explicit tuition, simply being 
reared by speakers of the local language. Lit¬ 
eracy is a secondary skill. Reading and writ¬ 
ing are acquired through instruction, and the 

26 Geary, 2005. 
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success of this instruction varies greatly. It is 
possible, for instance, to be competent in the 
spoken language and still be nearly unable 
to read [dyslexic). Adults can suffer brain 
injuries that render them unable to read or 
write even though they can still speak. This 
is called acquired dyslexia. This contrasts 
with developmental dyslexia, which is mani¬ 
fested when an apparently healthy child has 
abnormal difficulty learning to read. 

The dyslexias are an exception to the 
general rule that skills in spoken and writ¬ 
ten language are highly correlated, a find¬ 
ing that has been obtained in two different 
populations that have considerably different 
language skills: American college students 
and military enlisted personnel. 27 Finding 
that the same relationship occurs in differ¬ 
ent populations provides strong evidence for 
the generality of the relationship. 

2.5.2. Visual-Spatial Reasoning 

Almost all battery-type tests that are 
avowedly evaluations of intelligence mea¬ 
sure some form of visual-spatial reasoning. 
The tasks used vary in the extent to which 
they involve perceptual or reasoning pro¬ 
cesses. At the perceptual end, some tests 
require the identification of a pattern hid¬ 
den within another or identification of two 
patterns as the same, different, or perhaps 
one as a mirror reflection of the other. An 
example problem is shown in Figure 2.3. 
The reasoning end is represented by pro¬ 
gressive matrix problems and similar tasks, 
in which a [possibly abstract) pattern must 
be extracted from visual displays. 

There has been a good deal of contro¬ 
versy over whether visual-spatial reasoning 
tests should be included in test batteries 
used in educational and industrial person¬ 
nel selection programs, such as the SAT 
and the ASVAB. At present [2010) nei¬ 
ther battery assesses visual-spatial reasoning. 
Those who would include such tests argue 
that visual-spatial reasoning is an important 

27 Palmer et al. (1985) present data for college students; 

Sticht (1975) presents data for military enlisted per¬ 
sonnel. 


part of cognition in some courses of study, 
such as architecture and engineering, and 
in a variety of mechanical trades of use 
to the services. 28 Visual-spatial reasoning is 
assessed in some detail in personnel selec¬ 
tion for some occupations, especially in 
aviation. 

The argument against assessing visual- 
spatial reasoning is that there are many areas 
where the ability is not required. Visual- 
spatial reasoning is itself a complex domain. 
There are substantial male-female differ¬ 
ences in performance on some tests of visual- 
spatial reasoning. Therefore, if a person¬ 
nel selection program gave undue weight to 
visual-spatial reasoning, the selection system 
might be considered biased against women. 

2.5.3. Mathematical Reasoning 

Many, but not quite all, attempts to evalu¬ 
ate intelligence include some evaluation of 
mathematical reasoning. In the simplest case 
this amounts to nothing more than a test of 
how rapidly one can do simple arithmetic. 
See, for instance, the numerical problems on 
the Wonderlic Personnel Test [Table 2.8). 

Are tests of mathematical reasoning tests 
of one’s academic background, or are they 
tests of general intellectual competence? 
The answer is “probably a little bit of both.” 
Citing Geary and the evolutionary psychol¬ 
ogists once again, 29 humans are genetically 
programmed for some aspects of numerical 
reasoning, such as the idea of distinguishing 
specific numbers of objects rather than mak¬ 
ing a binary one-many distinction. These 
skills are rudimentary mathematics, com¬ 
pared to the calculus! Acquisition of sub¬ 
stantial mathematical skill clearly depends 
upon having an appropriate cultural back¬ 
ground. Therefore, in Geary’s classifica¬ 
tion, mathematical skills beyond a rudimen¬ 
tary level are secondary skills. Nevertheless, 
there is a good reason to assess them. Skill 
in mathematics is central to functioning in 
developed societies. 

28 Hegarty, Just, & Morrison, 1988; Humphreys & 

Lubinski, 1996. 

29 Geary, 2007a. 
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Figure 2.3. An example of a visual-spatial 
reasoning problem. Three of the four figures 
numbered 2 through 5 can be made identical to 
figure 1 by rotating them in the picture plane 
without lifting or turning the figure. Which of 
the figures cannot be made identical to figure 1 in 
this way? 


2.5.4. Deductive and Inductive Reasoning 

The ability to reason is widely accepted as 
a sign of intelligence. Two different types 
of reasoning are recognized, deductive and 
inductive reasoning. In deductive reasoning 
an examinee is told to assume that certain 
statements are true, and is asked to draw 
conclusions from them. Examples are 


Categorical syllogism: All the girls in Ms. 
Jones's class went on the field trip yester¬ 
day. Ann and Sally are in Ms. Jones's 
class. Where were Ann and Sally yester¬ 
day? 

Deduction: John and Sam drink beer only 
when they are together. Yesterday John 
was in town and Sam was out of town. 
Did John drink beer yesterday? 


There are logical and empirical argu¬ 
ments for evaluating abstract deductive rea¬ 
soning. 

Syllogisms and categorical reasoning are 
central to Western notions of mathemat¬ 
ics, law, and rational argument. Mothers tell 
children, “If you don’t eat your vegetables, 
you can’t have dessert.” Chocolate-loving 
children are supposed to draw an appropri¬ 
ate conclusion. 

Empirically, scores on abstract reasoning 
tests can be used to predict performance 
on other applications of intelligence, such 
as paragraph comprehension, mathematics, 
and (to a lesser extent) the ability to solve 
visual-spatial reasoning problems. Because 
abstract reasoning predicts so many other 
types of thinking, it is reasonable to believe 
that abstract reasoning is either a central part 
of intelligence, itself, or that it is closely tied 
to something that is. 

However, there is a case against stressing 
it too much. Although abstract reasoning is 
dear to Western academic and scientific cir¬ 
cles, outside these circles some people think 
of abstract reasoning as a sort of word game, 
with little intellectual content. Let me give 
two examples. 

In one of the Sherlock Holmes stories, 
"Silver Blaze,” the great detective deduces 
that an intruder did not break into a stable 
to steal a horse because a watchdog did not 
bark in the night. This is an example of the 
classic syllogism: 


Inductive reasoning is the process of 
abstracting general rules from observation of 
specific cases. Progressive matrix tests were 
developed to evaluate this ability. Inductive 
reasoning is also evaluated using other for¬ 
mats. Here are some widely used ones: 

Similarity: Which of the following cities 
does not belong in the group: San Fran¬ 
cisco, Las Vegas, San Diego, St. Paul? 

Series completion: Complete the next num¬ 
ber in the following series: 3:5:7_? 

Analogies: Choose the correct answer to 
complete the analogy: Black is to white 
as right is to: left., error, up, tight. 


Premise: A implies B. If there is an intruder 
the dog will bark. 

Observation: NotB. The dog did not bark. 

Conclusion: Therefore, Not A. Therefore, 
there was no intruder. 

When I used this example in an under¬ 
graduate class on critical thinking one of my 
students said that that was a nice made-up 
story, but that watchdogs do not always bark 
when strangers appear. Her more general 
point was that the real world is far more 
complicated than the abstract world of log¬ 
ical reasoning. She did not see the point in 
learning ways of thoughts that were useful 
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only in the abstract world. She is not alone 
in her attitude. 

Abstract reasoning is actually proscribed 
in some cultures. Consider the example 
of the two men who always drank beer 
together, given earlier as an example of a 
syllogism. The example is a paraphrase of 
an item that was used in an anthropological 
investigation of reasoning across cultures. 30 
Rural Liberian tribesmen said that the ques¬ 
tion was not reasonable, on the grounds that 
they did not know the individuals involved, 
and one ought not to draw conclusions 
about things that one has not experienced 
personally. 

Both these arguments can also be used 
against the inclusion of abstract inductive 
reasoning items in intelligence testing. In 
addition, there is another objection. The 
answer to an inductive reasoning problem 
is never uniquely determined. This is shown 
by the example in which an examinee was 
asked to find the dissimilar city in the set 
(San Francisco, San Diego, Las Vegas, St. 
Paul}. One could argue that the first three 
cities are in the Far West of the United 
States, while St. Paul is in the Midwest. 
Or that the first three cities have Spanish- 
derived names, while St. Paul does not. Or 
that Las Vegas is the only city not named 
for a Christian saint. There are many ways 
in which the four items can be compared; 
the question does not specify which one is 
to be used. 

Similar arguments can be made against 
the use of series or analogy problems, includ¬ 
ing progressive matrix tests. In all cases the 
answer is determined by consensus, and the 
consensus opinion may be different in dif¬ 
ferent cultures. Here is another example. 

Which two of these animals belong together: 
fox, cat , dog? 

When University of Michigan students 
were asked this question most of them said 
that the fox and the dog belong together 
because they are both canids. When Central 
American Amerindians were asked the same 
question their preferred answer was that the 
fox and the cat belong together because of 

30 Cole et al., 1971. 


their similar behavior. The Michigan stu¬ 
dents preferred a taxonomic grouping; the 
Amerindians preferred an ecological one. 31 
Who is to say which answer is correct? 

This example is by no means an iso¬ 
lated one, restricted to a contrast between 
students in a post-industrial society and 
members of a (somewhat) traditional agrar¬ 
ian culture. Richard Nisbett, a professor at 
the University of Michigan who has made 
extensive studies of cultural influences 
on thought, has offered similar examples 
involving contrasts between the reasoning 
of American and Eastern (Asian) styles 
of thought. Nisbett has argued that the 
American-European emphasis on focusing 
on only the perceived relevant aspects of a 
situation, and applying formal logic to those 
aspects only, is a marked contrast to an East¬ 
ern emphasis on being sensitive to the total 
context of a problem. 52 

2.5.5. Another Way to Look at Content: 
Aptitude versus Achievement 

A distinction is sometimes made between 
aptitude and achievement tests. The distinc¬ 
tion can be illustrated by comparing three 
different college admissions tests: the SAT-I, 
the SAT-II, and the American College Test 
(ACT). The SAT-I is said to stress aptitude, 
because its content is not tied to specific 
course curricula. The SAT-II and the ACT 
contain subtests tied to courses, tests of his¬ 
tory, literature, science, and mathematics. 33 
What is the difference? 

Achievement is simple enough. You have 
successfully studied English, mathematics, 
history, physics, or anything else if you know 
the relevant facts and principles. But what 
is meant by “aptitude?” 


31 Lopez et al., 1997. 

32 Nisbett, 2009, pp. 162-170. 

33 The ACT was originally developed by E. Lindquist, 
a psychologist who was involved in the early devel¬ 
opment of the SAT. Lindquist disagreed vehe¬ 
mently with Chauncey's decision to emphasize apti¬ 
tude over ability. He believed that it was more 
appropriate to evaluate a person's readiness for fur¬ 
ther education by determining what the person had 
learned up to the point of testing. 
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According to Richard Snow, an educa¬ 
tional psychologist from Stanford Univer¬ 
sity, “aptitude” implies that a person has 
a talent for doing something - politics, 
music, athletics, and so forth. 54 “Achieve¬ 
ment” implies that a person already has done 
something in a certain field. A person who 
has an aptitude for a field should do well 
if he or she trains in that field, but having 
“aptitude” does not imply that the training 
has taken place. You can have an aptitude 
for playing music without knowing how to 
play the clarinet. But if you have the apti¬ 
tude, learning to play should be easy. 

The SAT-I was once explicitly called a 
scholastic aptitude test because it was sup¬ 
posed to identify students who could be suc¬ 
cessful if they went to an American college 
or university, without stressing the knowl¬ 
edge acquired in particular courses. By con¬ 
trast, the SAT-II and ACT test identified 
topics, such as mathematics. However, the 
SAT-I is not entirely knowledge-free. For 
instance, it assumes that examinees know 
the English language, but does not assume 
that they have had a course in English 
literature. 

Progressive matrix tests are aptitude tests 
that make even fewer demands on knowl¬ 
edge. However, users of these tests assume 
that the examinee understands the basic 
testing situation, and there are various 
strategies for taking such tests that depend 
upon cultural knowledge. 

The developer of an achievement test 
assumes that all examinees will have had 
certain experiences - for example, a course 
in American history. The test is intended to 
determine what the examinee learned from 
the experience. The argument for using 
achievement tests as screening devices is 
that one of the best predictors of how well 
a person will do at learning something in 
a new situation is how much the person 
has learned in comparable situations. By 
this logic, the best predictor of grades in 
first-year college physics is a test of how 
much the student learned in high school 
physics. 

34 Snow, 1996. 


In educational settings it turns out that 
very much the same admission decisions are 
made regardless of whether an aptitude or 
an achievement test is used. While there 
are cases of people who score highly on 
the SAT-I and do poorly on the ACT, and 
vice versa, at the population level the SAT 
and the ACT predict success or failure for 
almost the same people.* 5 On the whole, 
if you have an aptitude for academic stud¬ 
ies, then you will have learned a lot from 
the classes you have already had; if you do 
not have the aptitude, you will not. This is 
not to deny the fact that late bloomers do 
exist; there are people who have the tal¬ 
ent to succeed in academics but who, for 
a variety of reasons, have not learned very 
much in high school. Conversely, there are 
people who do not do well on abstract apti¬ 
tude tests, but are quite good at learning 
academic material. These cases are excep¬ 
tions to the rule. Academic achievement 
and academic aptitude test scores are highly 
correlated. 

2.6. Test Creation and Use 

We now move from a consideration of the 
content of cognitive tests to some more 
general issues about how the tests are 
developed. 

2.6.1. Item Selection and Evaluation 

The questions on intelligence tests have 
been the butt of quite a few jokes and sar¬ 
castic remarks. Here is what one popular sci¬ 
ence writer admitted that he believed prior 
to looking into the issue. 

the questions on IQ tests are all written by 
graduate students from Connecticut and 
begin “Teddy leaves Sag Harbor on the 
brunchtime jitney ..." 

B. Maddox, writing in Discovery, 
March 2008, p. 18 

35 Koenig, Frey, and Detterman, 2008. These authors 
also show that the ACT has a substantial predictive 
correlation, .61, with the Raven Progressive Matrices 
test, which is in no sense an achievement test! 
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Maddox’ s comment is not too far from 
Walter Lippmann’s characterization of the 
first Stanford-Binet test, eighty years earlier 
(see Chapter 1). Neither criticism addresses 
the rigorous procedures used for item selec¬ 
tion. And, in fairness to Maddox, later in his 
article he explains that his initial prejudices 
were wrong, and he delves into some of the 
issues surrounding intelligence testing in a 
reasonable way. 

Candidate questions on all intelligence 
tests are initially developed either by identi¬ 
fying a group of people who are thought to 
be intelligent to varying degrees, and seeing 
what sorts of problems they can solve, or by 
generating questions from a theory of what 
intelligence is. The candidate questions are 
then given to a sample of people who are 
thought to be typical of the population for 
whom the test is intended. One way to do 
this is to insert a new question into an exist¬ 
ing test. Answers to the new question are not 
counted as part of the score on the existing 
test, but statistics are gathered on them. If 
these statistics meet certain criteria, the new 
item is incorporated into the next revision of 
the test. 

If a test question is a good one, scores on 
that question should be positively correlated 
with people’s scores on the other items on 
the test. Consider an analogy between taking 
a test and a high jump competition, where 
competitors have to jump over a bar set at 
various heights. Suppose a competitor can 
jump over (“clear”) a 1.75 m bar (5 ft., 9 in.), 
but cannot jump over a bar set to 1.83 m 
(6 ft.). We would expect this competitor 
to be able to jump over a 1.6 m bar, but 
not to be able to jump over a 1.9 m bar. 
More generally, if a jumper can clear a bar 
at height x, but cannot clear a bar at height 
y {y > x), we expect the jumper to clear all 
heights lower than x, and not be able to clear 
a bar at heights higher than y. 

Suppose we wanted to construct a ten- 
item test of the ability to solve word prob¬ 
lems. According to psychometrics , the term 
for the science of test construction, we 
should look for ten problems that (a) could 
be ordered in terms of difficulty, defined by 
the percentage of people who can solve each 


problem, and (b) behave like the high jump 
bar - if a person can solve a problem at dif¬ 
ficulty level % but cannot solve a problem at 
difficulty level y, where y is greater than x, 
then the person should solve all problems 
with difficulty level less than % and not solve 
any problems with difficulty level greater 
than y. 

What I have just described is an ideal 
situation. In practice, there will always be 
some people who fail to solve a problem at 
one level of difficulty, but do occasionally 
solve problems at a higher level of difficulty. 
Therefore, a statistical technique called Item 
Response Theory has been developed to select 
questions that can be thought of as measur¬ 
ing the same thing A' Items that have this 
property are said to be scalable. The point 
to remember is that a great deal of care is 
taken to select test questions that appear to 
be evaluating the same skill, but at different 
levels of difficulty. 

2.6.2. Differential Impact 

Avoiding differential impact is part of item 
selection. Differential impact means that an 
item is selectively difficult for some sub¬ 
population of test takers, compared to the 
general population. The idea may be best 
illustrated by an example, once again using 
mathematics word problems. 

Baseball problem, version 1. The Detroit 
baseball team was ahead of the Seattle 
team by one run. But at the very end of 
the game the Seattle team scored two runs. 

Did the Seattle team or the Detroit 
team win the game? 

This is a simple arithmetic problem. You 
can solve it without knowing very much 
about baseball. Now look at version 2. 

Baseball problem, version 2. It was the bot¬ 
tom of the ninth and the Tigers were up 
by one. The first Mariner walked, and the 
next one up homered. 

Did the Seattle team or the Detroit 
team win the game? 

36 Embretson & Reise, 2000. 
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For anyone who knows baseball this is the 
same simple arithmetic problem as version 
1. If you do not know baseball version 2 is 
harder than version 1, because version 2 has 
been written using baseball jargon. If you do 
not know that the names of the Detroit and 
Seattle teams are, respectively, the Tigers 
and the Mariners, all you can do is guess 
what the answer might be. Version 2 has 
differential impact on people who are or are 
not familiar with baseball. 

This example was made up to make a 
point. However, it does resemble objections 
that have been made in realistic situations. 
For instance, on occasion tests have been 
attacked for differential impact on African 
Americans because they were written using 
the vocabulary and syntax of the majority 
population. 

Differential impact does not mean that 
tests and items have to be equally difficult 
for all groups. To see why, consider again 
the high jump competition. Women, in gen¬ 
eral, cannot jump as high as men. (As of 
April 2010 the men's record was 2.45 m, and 
the women's 2.09 m.) If we were to set a 
high jump bar at a randomly chosen height 
between one and two meters, we would find 
that a higher percentage of men than women 
could jump over the bar. Nevertheless, high 
jump bars do not have differential impact, 
because they meet the criterion for equal 
scalability in both groups; the difficulty of 
jumping over a bar increases with height for 
both men and women. The same argument 
applies to cognitive tests. Tests and items 
do not have differential impact if they scale 
equivalently in each group. This means that 
the order of difficulty must be the same for 
all items, and in addition that the items must 
meet some further statistical criteria speci¬ 
fied by item response theory. 

This definition of differential impact is 
one based on psychometric theory. Law 
courts have accepted definitions of differ¬ 
ential impact that are somewhat differ¬ 
ent, and that change from time to time 
and from jurisdiction to jurisdiction. The 
psychometric definition tells test develop¬ 
ers what sort of items they should avoid. 
Satisfying the psychometric definition will 


generally, but not always, also satisfy the 
legal definition. 

2.6.3. The Distribution of Test Scores 

It is often said that intelligence is “nor¬ 
mally distributed.” The facts are a bit more 
complex. We need to distinguish between 
test scores, IQ and similar metrics, and the 
underlying concept of intelligence as a prop¬ 
erty of an individual. Again we draw on 
an analogy between intelligence testing and 
high jumping. 

We can think of a high jump competition 
in the following way. The jumps are ordered 
by height, from the lowest to be considered 
to the highest. Competitors try to jump at 
each height, and are scored by the number 
of jumps they clear. A competitor's score 
would not depend on the order in which 
jumps were attempted. To see this, consider 
a person who has the ability to jump 1.5 m 
high, and a contest in which the jumps are, 
in order of height, 1 m, 1.5 m, and 2 m. That 
person will succeed on two jumps, regard¬ 
less of the order in which they are presented. 
An Olympic-level athlete would almost cer¬ 
tainly succeed on all three, and if I were to 
compete, I think I could manage to clear the 
lowest jump. 

In a cognitive test each question is associ¬ 
ated with a level of difficulty, just as the 
height of a bar is associated with diffi¬ 
culty in high jumping. The “raw score” on 
a test is the number of questions the per¬ 
son can answer correctly. (If the test con¬ 
tains multiple-choice questions there has to 
be a correction for guessing, but this is easily 
made.) 

What would the distribution of scores be 
over the population of people who either 
compete, in high jumping, or take the test, 
in intelligence research? That depends upon 
two things; how high we set the bar (or how 
difficult we make the items) and how skilled 
the competitors (test takers) are. 

If we were testing male high school-level 
competitors using the 1, 1.5, and 2 m bars, 
virtually everyone would clear the lowest 
bar, most would clear the middle bar, and 
a few would clear the highest bar (a little 
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below the US national high school record). 
The distribution of scores would contain 
very few o’s, some i's, mostly 2’s, and a 
few 3's. We could change the distribution of 
scores by changing competitors. The score 
for male college jumpers would be mostly 
3's, for female jumpers almost all 2’s (for the 
highest bar is just under the world women’s 
record). Or we could change the distribu¬ 
tion of scores by changing the height of the 
bars. Suppose the bars were set at 1.4, 1.6, 
and 1.8 meters. The scores for male college 
jumpers would pile up in the 3's, and we 
would begin to see more 3’s in the women’s 
competition. 

We could do exactly the same thing 
in cognitive testing, except that we would 
manipulate item difficulty instead of the 
height of the high jump bar. For a given 
population (high school students, military 
recruits, etc.) the distribution of raw scores 
(numbers of questions answered correctly) 
depends upon how many items the test con¬ 
tains at different levels of difficulty. For a 
given test, the distribution of scores depends 
upon the distribution of cognitive skills in 
the population. 

Many cognitive tests are constructed so 
that they yield a normal distribution of 
test scores in their intended population. 
For instance, SAT scores are approximately 
normally distributed over the population 
of people who apply to American colleges 
and universities. This is a selected group 
of individuals of higher-than-normal intel¬ 
ligence. By contrast, scores on the WAIS are 
intended to reflect a person’s intelligence 
relative to all people in the population. A 
student who receives a near-average score 
(somewhere around noo for the combined 
scales) on the SAT-i would probably have 
an IQ score of over 100, simply because the 
mean intelligence level of the high school 
students who apply to college is higher than 
the mean intelligence level of all people in 
that age group. 

As a result, raw scores on intelligence 
tests are (approximately) normally dis¬ 
tributed because the questions on IQ tests 
have been selected to produce a normal 
distribution! Contrast this to height, which 


(within sexes) is approximately normally 
distributed. Since the measurement pro¬ 
cedure for height is dictated by a theory 
of what length is, the fact that height is 
normally distributed is a discovery about 
nature. The fact that IQ and similar scores 
are distributed normally is a consequence of 
the way the tests are constructed. 

There are marked advantages in requir¬ 
ing that test scores be normally distributed 
over an appropriate population. The statis¬ 
tical procedures for dealing with normally 
distributed scores are well known. If scores 
are normally distributed, the standard score 
metric (described in Panel 1.1) provides a 
convenient way of comparing individuals 
to each other in terms of different cogni¬ 
tive skills, measured on the same popula¬ 
tion. Suppose that we are told that a par¬ 
ticular individual had a score of 555 on the 
verbal section of the SAT-i and a score of 
580 on the mathematics section. This does 
not tell us too much, for the scores on the 
two parts of the test are not comparable. 
By contrast, if we were told that the indi¬ 
vidual had a standard score of 0.1 on the 
verbal test and 0.0 on the mathematics test, 
we would know immediately that the per¬ 
son scored just slightly above the mean on 
the verbal test and received the mean score 
on the mathematics test. 

When scores are normally distributed 
there is a well-known translation from stan¬ 
dard scores to percentiles. 57 For instance, for 
a normally distributed set of test scores, a 
person with a standard score of 1 will have 
a score that is above the scores of approx¬ 
imately 85% of the population. Such inter¬ 
pretations are often useful. Since IQ scores 
are simply a translation of standard scores to 
a scoring algorithm with a mean of 100 and 
standard deviation of 15, compared to stan¬ 
dard scores with a mean of o and a standard 
deviation of 1, the same sorts of statements 
about distributions can be made for scores 
using the IQ metric. 

37 The same thing would be true for any known dis¬ 
tribution of test scores, but the conversion may be 
more complicated. The normal (Gaussian) distribu¬ 
tion is, by far, the most widely used. 
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2.6.4. Scoring: An Alternative 
to Using Raw Scores 

Raw scores are easy to compute and 
understand. Item Response Theory (IRT), 
mentioned earlier, provides an alternative 
method of scoring that has the advantage of 
depending much less on an arbitrary selec¬ 
tion of item difficulties in order to force 
a normal distribution of scores, but has 
the disadvantage of being much harder to 
understand] I shall present the chief results 
of IRT, without attempting to present the 
mathematics. 38 

Return, for the last time, to the anal¬ 
ogy between intelligence testing and a high 
jump competition. This time, though, sup¬ 
pose that the judges have lost their measur¬ 
ing devices, so they don't know how high 
the bars are] They can still conduct the meet, 
and assign sensible scores to the competitors. 
Here is how this would work. 

We can safely assume that every competi¬ 
tor has a trait, jumping ability. Every bar, 
although of unknown height, has a quality 
we will call jumping difficulty. We can mea¬ 
sure jumping difficulty directly, by seeing 
what percentage of people can clear a bar, 
even if we do not know how high that bar 
is. The insight of IRT is that jumping ability 
and jumping difficulty must have the same 
scale. 

We now make an assumption - that in 
some reference population jumping ability is 
distributed normally, with a standard score 
mean of o and a standard deviation of 1. 
Arbitrarily, let us decide that the popula¬ 
tion of male high school students will be 
the reference population. We find the bar 
(of unknown height, it doesn't matter] such 
that half of all high school students can clear 
this bar. By our assumption of the normal 
distribution, half of all high school students 
have a jumping ability above the mean, and 
half below, that is, a standard score of o. 
Therefore, the bar that just half the students 

38 The basics of IRT were developed in the 1960s 

(Birnbaum, 1968). The method did not become fea¬ 
sible until the advent of modern high-speed com¬ 
puting. See Embretson & Reise (2000) and E. Hunt 

(2007) for further discussions. 


can clear must have a jumping difficulty 
score of o. By the same token, if approxi¬ 
mately 16% of the students clear a second, 
higher bar, then the second bar must have a 
jumping difficulty of 1. Why? Because, by 
the properties of the normal distribution, 
16% of the population has a standard score 
above 1. 

If we carry out the above norming proce¬ 
dure for, say, thirty bars of different heights, 
we will have a test of thirty items, each of 
which has a jumping ability defined by the 
percentage of people in the reference pop¬ 
ulation who cleared each bar. Note that the 
raw scores (number of bars cleared) would 
not necessarily follow a normal distribution. 

The analogy to cognitive testing is exact. 
We take a reference population, assume that 
the cognitive ability of interest (intelligence, 
verbal ability, etc.) is normally distributed 
in the reference population, and infer diffi¬ 
culty levels for questions by observing the 
percentages of people in the referent popu¬ 
lation who can answer each question. 

The resulting test can then be used to 
measure ability levels in populations other 
than the reference population, using the 
scale derived from norming in the reference 
population. In the high jumping example, 
we could compare the scores of college-level 
women competitors to those of high school- 
level women competitors, using the scale 
established by the high school menl In an 
intelligence application, we could compare 
the intelligence level of, say, Harvard stu¬ 
dents to Yale students using a scale estab¬ 
lished at Stanford. 

The only arbitrary assumption is that the 
underlying ability is normally distributed 
in the reference population. If the refer¬ 
ence population represents a wide range of 
ability, this assumption can be defended. 
If intelligence is produced by the cumula¬ 
tive effects of many different causes - rang¬ 
ing from inheriting good genes to going to 
a good school - and if these causes are 
independent of each other and no one of 
them has a huge effect, then intelligence will 
be distributed normally. This follows from 
demonstrations of ways in which the nor¬ 
mal distribution can be derived. 
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2.6.5. The Importance of Norming 

Choosing a reference population is an essen¬ 
tial step in test construction, regardless of 
whether the raw score or IRT procedures 
are used to establish test scores. The appro¬ 
priate reference population depends on the 
purpose of the test. The Wechsler tests are 
intended to be used on national populations. 
Therefore, an attempt is made to obtain a 
probability sample of an entire population 
(e.g., all Spaniards, Americans, or Germans 
for the appropriate version of the test) for 
norming purposes. Because these tests are 
intended to be applicable to all ages, the 
norming sample ought to contain a large 
number of people of different ages. This is 
an expensive proposition, so at times various 
compromises are used. For instance, some 
norming studies for the Raven tests were 
carried out in just one city, but the city was 
chosen because it had values on a variety of 
demographic variables that were thought to 
be 'Typical” of the country in question. 39 

The problem of norming is much easier 
when a test is intended for a clearly defined 
subset of a population. The SAT is intended 
primarily for students in the junior or senior 
year of an American high school and who 
intend to try to enter college, so norming is 
carried out in samples of college applicants. 
The ASVAB is normed by taking a popu¬ 
lation sample of all high school students, 
because of a legal requirement that the US 
military define its enlistment goals based in 
part on the distribution of cognitive skills in 
the high school population. 

When we compare studies using differ¬ 
ent tests we have to keep in mind the effect 
of different tests having been normed in dif¬ 
ferent populations. Four of the most widely 
used tests, the WAIS, the Raven tests, the 
SAT, and the ASVAB (or its subsidiary 
scale, the AFQT), are not strictly compa¬ 
rable. Studies have been done in which 
the same people take a pair of these tests. 
The investigators can then develop proce¬ 
dures for converting from one test score to 
another. 40 

39 J. Raven, 2000. 

40 Frey & Detterman, 2004; Koenig, Frey, & Detter- 

man, 2008. 


Test developers try to make a test maxi¬ 
mally sensitive to changes in the underlying 
trait (intelligence, verbal ability, or a similar 
cognitive trait) in the middle ranges of the 
reference population. This means that if a 
test is used to study a population that is very 
different from the reference population, the 
test may not do a good job of discriminating 
between people in the new population. To 
see this, imagine that for some perverse rea¬ 
son it was decided that the AFQT and the 
SAT should switch roles; colleges and uni¬ 
versities use the AFQT, while the military 
uses the SAT. It would be hard to distinguish 
between the “better" and the “best" stu¬ 
dents applying to universities, because both 
groups would be getting very high scores on 
the AFQT, which is markedly easier than 
the SAT. This is called a ceiling effect. The 
military would find it hard to distinguish 
between applicants who were “marginal but 
acceptable" and ones who were “unaccept¬ 
able” because both groups would be getting 
very low scores on the SAT. This is called a 
floor effect. 

Of course, no one is going to switch 
the AFQT and the SAT. However, changes 
in reference populations do occur, and 
can have important practical consequences. 
Panel 2.1 describes one of them. 

2.7. Some Issues Raised by the Use 
of Tests in Personnel Selection 

How does cognitive testing fit into psychol¬ 
ogy, as a science, and into the cultural set¬ 
ting that produced the tests? These ques¬ 
tions require a broader view than is provided 
by a narrow focus on test scores. 

2.7.1. Conflicts of Interest Are Inherent 
in Personnel Selection 

Personnel selection inevitably produces a 
tension between the selecting institution 
and the applicant. The argument is often 
framed as an argument over a test, but ratio¬ 
nally it should be over the decision process 
using the test, not the test itself. The prob¬ 
lem can be illustraten by the case of an ide¬ 
alized college admission process. 
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Panel 2.1. The Changing Definition 
of Mental Disability 

Intelligence test scores rose steadily for 
most of the twentieth century.* This is 
true both for raw scores and for the 
underlying scores, derived by using the 
IRT methodology. The phenomenon is 
covered in detail in Chapter 9. For the 
moment, just accept the fact. The rise 
has had the unexpected consequence of 
raising questions about what we mean by 
mental retardation. 

Prior to 1992 the American Associa¬ 
tion for Mental Retardation defined retar¬ 
dation as a child’s having an IQ of 
70 or less, measured by the Wechsler 
Intelligence Scale for Children (WISC). 
New norms are established for the WISC 
on a regular basis. The second version, 
the WISC-R, was normed on a sample 
representative of the US population in 
1974; the third version, the WISC-III, 
was normed on a similar sample taken 
in 1991. But the 1974 and 1991 popula¬ 
tions were not intellectually the same. It 
has been estimated that over that time 
period the mean on the full-scale IQ 
score for the WISC shifted upward by 
about five points. Therefore, the “zero 
point" (IQ = 100, standard score = o) 
for the two tests was not at the same point 
on the underlying trait. 

Tomoe Kanaya, a researcher at Cornell 
University, and her colleagues showed 
that this fact has serious implications for 
policies concerning the developmentally 
disabled.' Suppose that a child were to 
take the WISC-R in 1990. At that point 
the WISC-R, normed in 1974, would 
be the latest available test. Suppose fur¬ 
ther that the child obtained a score of 74. 
The child would not be classified as dis¬ 
abled. 

Now suppose that the same child had 
been examined in 1992, instead of 1990, 
using the then-latest version of the test, 
the WISC-III. Presumably the child’s 
mental capacities would be the same, but 


the reference point, the mean intelligence 
score, would have been moved up by five 
IQ points. Since IQ is determined by the 
deviation of a score from the population 
mean, the child would receive a score of 

69 on the WISC-III, and hence be classi¬ 
fied as developmentally disabled. 

A similar example could be con¬ 
structed at the other end of the scale. 
Suppose that in order to qualify for a 
“gifted student" program a child has to 
have an IQ of 130. Imagine a child who 
received an IQ score of 133 in 1990, using 
the WISC-R. The child qualifies as gifted. 
If the same child had been tested in 1992, 
using the WISC-III, the IQ score would 
be 128 and the child would not qualify as 
being gifted. 

This forces us to think about what we 
mean by terms like “mental retardation" 
and “gifted.” Do we want to define these 
groups by an absolute level of cognitive 
power? If so, qualifying scores should be 
adjusted so that the same people qualify 
as mentally disabled (or gifted), regard¬ 
less of the population on which the test 
is normed. By this definition, if roughly 
2.5% of the population of the US qualified 
as developmentally disabled in 1974 (IQ < 

70 by 1974 standards), then only slightly 
less than 1% (IQ < 70 by 1974 standards, 
IQ < 65 by 1991 standards) would qualify 
in 1991 - not because the cognitive power 
of the mentally disabled had changed, but 
because the population as a whole had 
become more intelligent. The change in 
the incidence of diagnosed cases of retar¬ 
dation would have major public policy 
implications for, among other things, the 
amount of money needed for programs 
for the mentally disabled. 

Another argument is possible. Sup¬ 
pose we define anyone who is in the 
contemporary bottom 2.5% of the intelli¬ 
gence distribution as being mentally dis¬ 
abled, regardless of where they would 
stand in the 1974 population. If we do 

(continued) 
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Panel 2.1 (continued) 

this, then the people who are classified 
as retarded using the 1991 test would, in 
a sense, be more intelligent than those 
classified using the 1974 test, which seems 
wrong. However, you could argue that if 
the population becomes more intelligent, 
everyday living will become more cogni¬ 
tively challenging. If this is true, then per¬ 


haps the definition of developmentally 
disabled ought to change, because people 
who were just able to keep up with soci¬ 
ety in 1974, those in the IQ 70-75 range 
on the 1974 test, cannot keep up with the 
more complicated society of the 1990s. 
What do you think? 

* Flynn, 1984, 2007. 

T Tomoe, Scullin, & Ceci, 2003. 


College and university applicants present 
evidence of their academic accomplish¬ 
ments and skills, often including their SAT 
scores. Figure 2.4 shows the six-year grad¬ 
uation rates of college students as a func¬ 
tion of their entering SAT-I scores. There 
is a 7:1 difference between graduation rates 
in the highest, as compared to the low¬ 
est, SAT score interval. Admission officers 
should always prefer high scorers to low 
scorers. 

The problem can be stated formally. Let % 
be an applicant’s score on an entrance exam¬ 
ination and let p(x ) be the probability that 
an applicant with score x will graduate. 

The institution should set a minimum 
score, x mi , [m for minimum, i for institu¬ 
tion) and should reject all applicants with 
scores lower than this. The value of x mi 
depends upon the benefits to the institution, 
B if if the student graduates, and the costs 
Q, to the institution, if the student does not 
graduate. For an applicant with evidence x 
the expected benefits are p(x)B if and the 
expected costs (1 — p(x))Q. The admission 
officer should accept the applicant if 


p{x)B, > (1 - p(x))Q, (2.1) 


which is equivalent to 

pffl > Q 

(1 - p{x)) - Bi ' 


(2.2) 


The left-hand side of equation 2.2 is called 
the odds on success, and the right-hand side is 
the cost'.benefit ratio. The odds on success are 


established by the evidence; the cost:benefit 
ratio is determined by the costs and benefits, 
as seen by the institution. The college should 
set Xmi to be the point at which expected 
costs equal expected benefits. The institu¬ 
tion should accept the applicant only if the 
score is at or above the institution’s critical 
value. 

The applicant can reason in a similar 
way. There are individual benefits, B a , for 
graduating from a college or university, and 
costs, C a , for attending but not graduat¬ 
ing. While the applicant and the institu¬ 
tion should agree on the implications of 
the evidence, p[x), and hence agree on the 
odds for success, they may disagree (legit¬ 
imately) about the cost:benefit ratio. The 
applicant should establish a critical value, 
x c , and should want to enter an institution 
only if his or her test score is above this 
value. Because the applicant's costs and ben¬ 
efits are not the same as the institution’s, 
the applicant's critical value, x c , may not be 
the same as the institution's minimum value, 
x m i ■ The applicant and the institution should 
agree about acceptance when the applicant’s 
scores are high enough to be above both crit¬ 
ical values, and should agree about rejection 
when the scores are below both values. Dis¬ 
agreements arise if an applicant's scores are 
between the two critical values. If the appli¬ 
cant’s cost:benefit ratio is lower than the 
institution’s, there will be cases where the 
institution rejects an applicant who, from 
his or her perspective, ought to have been 
accepted. If the applicant's cost:benefit ratio 
is higher than the institution’s, the applicant 






THE TESTS 


59 



599 699 799 899 999 1099 1199 1299 

SAT-I score prior to matriculating 

Figure 2.4. Six-year graduation rates as a function of entering 
SAT-i scores. Source: The College Board, as reported in The 
Washington Post, Jan. 27, 2000, p. An. The data does not reflect the 
use of a test of writing, which is now included in the SAT-i. 


will reject an acceptance offer. In each case 
the disagreement is over the costs and bene¬ 
fits, not the implications of the evidence. In 
actual arguments over the use of tests in per¬ 
sonnel decisions these different arguments 
are sometimes confounded. 

2.7.2. Missing Evidence: What the Tests 
Do Not Evaluate 

Arguments about the interpretation of cog¬ 
nitive tests take three forms. One, which 
is an argument largely carried out among 
researchers, is over the appropriate analy¬ 
sis of a particular study. While such argu¬ 
ments can be important, they are not the 
ones that catch the public’s eye. Public con¬ 
cerns center on two things: systematic omis¬ 
sion of certain abilities that might be tested 
but often are not, and omission of abilities 
that may be important to cognition but that 
do not fit into the conventional "Drop in 
from the Sky” testing format. 

ABILITIES THAT COULD BE TESTED BUT 
OFTEN ARE NOT 

Cognitive tests are designed to cover those 
skills that are required for all the situations 
where the tests are used, at the expense of 


evaluating cognitive skills that are required 
in only some of the applications. The SAT, 
for instance, evaluates cognitive skills that 
are common requirements across the under¬ 
graduate curriculum, but does not evaluate 
skills in, say, music or drawing, which are 
required in only some majors. There is also 
no opportunity for examinees to demon¬ 
strate a cognitive skill unless the examiner 
has decided that it is appropriate. The extent 
to which this is a problem depends upon 
the breadth of the applications for which 
the test is intended. The problem is reduced 
when a test is intended for personnel classi¬ 
fication in a well-defined setting, such as the 
schools or the armed services. In such cases 
test developers can, and do, consult with the 
institution to determine what skills must be 
tested. 

This policy, which is sensible in itself, cre¬ 
ates a problem when a test is either used 
for research purposes, to study intelligence 
in general, or applied in a personnel selec¬ 
tion setting outside of the one for which it 
was originally intended. In the former case, 
insisting that the current format of a test 
(or tests) is the definition of intelligence 
essentially puts a blinder on research efforts, 
because it amounts to agreeing with Boring 
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that “intelligence is what the test tests/' thus 
ruling out conceptual advances. Critics of 
intelligence research have claimed that this 
sort of thinking has hurt the field. 

When a test is used outside the setting in 
which it was developed there is a substantial 
chance that the test will fail to evaluate some 
aspect of cognition that is, indeed, central in 
the new application. Robert Sternberg has 
referred to this sort of restriction as a failure 
to evaluate practical intelligence, knowledge 
and problem-solving skills that the examinee 
will actually need “on the job.” 41 

Sternberg’s concern over practical intel¬ 
ligence is closely related to his, and others', 
concerns that cognitive tests typically do not 
evaluate social intelligence, that is, knowing 
what to do in various social situations. Eval¬ 
uating social intelligence sounds like a rea¬ 
sonable goal, but there are problems. A test 
can be designed to determine what an exam¬ 
inee thinks is right to do in a particular social 
situation, but that might not be what the 
examinee would do. If we are only inter¬ 
ested in knowledge of the “correct” answer, 
then a test in social intelligence is very close 
to becoming a test of verbal comprehen¬ 
sion. Finally, different segments of the same 
society may disagree about what the correct 
answer is. To take an extreme example, sup¬ 
pose a question on a social intelligence test 
asked what a deeply religious father should 
do if his daughter proposed marrying outside 
her family's faith. People in different cul¬ 
tures will have different views about what 
the right answer is. 

There have been continuing concerns 
over the limited scope of tests of visual- 
spatial reasoning. Intelligence tests seldom 
evaluate examinees’ ability to reason about 
moving objects, as might be required in a 
task as simple as crossing a street in the face 
of traffic, or maintaining one’s orientation 
in the environment. There are substantial 
individual differences in the possession of 
both of these skills. 42 While they are hard 
to evaluate using the conventional paper- 
and-pencil format, appropriate tests can be 

41 Sternberg, 2003. 

42 Hunt, 2002. 


incorporated into computer-controlled cog¬ 
nitive testing. 

ABILITIES THAT DO NOT FIT INTO THE 
CONVENTIONAL TESTING FORMAT 
In normal life, thinking is distinguished by 
two important characteristics: problem solv¬ 
ing is attempted because the solution has a 
value in itself, and finding the solution may 
require considerable time for planning and 
reflection. For example, US taxpayers are 
highly motivated to file their returns cor¬ 
rectly and on time. Preparing a return takes 
anywhere from an hour to several working 
days. The task will be considerably simpli¬ 
fied if the taxpayer has taken the trouble, 
in advance, to organize records of income 
and expenses. Doing so requires foresight, 
thoughtfulness, and personal discipline. 

A test taker solves problems solely to be 
evaluated, outside of the context of normal 
events. Minutes, not days, are allowed for 
problem solving. There is no way to evaluate 
the examinee’s skill in executing behaviors 
that, by their nature, take time. What might 
some of these behaviors be? 

Learning: Learning, by definition, is 
change in behavior over time. Test develop¬ 
ers may minimize opportunities for learning 
of content during a test because they want 
the difficulty of a question to be determined 
by the characteristics of that question, rather 
than by what an examinee has learned in 
answering other questions. This is called the 
item independence property. It is a corner¬ 
stone of the mathematical basis for modern 
theories of testing and scalability. But item 
independence defeats any attempt to evalu¬ 
ate learning. 

Dynamic testing is a modification of 
conventional testing in which learning is 
evaluated. 44 The idea behind dynamic test¬ 
ing is that the examiner should first deter¬ 
mine the level of difficulty of the hardest 
problem that a person can solve. This is 
said to establish a zone of proximal devel¬ 
opment. The examiner then provides hints 
and suggestions that the examinee can use 

43 Allahyar & Hunt, 2003; Hegarty & Waller, 2005. 

44 Lidz & Elliot, 2000. 
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to solve more difficult problems. The proce¬ 
dure is continued until a point is reached 
beyond which the examinee cannot go, 
even with the examiner’s support. The level 
of achievement reached after coaching is 
assumed to be the examinee’s true ability. 

Dynamic assessment is closely related to 
a form of educational testing called forma¬ 
tive assessment . 45 Formative assessment has 
been shown to aid in learning complex con¬ 
cepts (e.g., introductory physics) in some 
circumstances. 46 

The value of either dynamic or forma¬ 
tive assessment as an evaluation instrument 
is unclear. To establish the validity of this 
sort of testing as an assessment it would 
be necessary to show that scores obtained 
after coaching are better predictors of future 
performance than scores obtained prior to 
coaching. As far as I know, this has not yet 
been done. 

The ability to reflect: Conventional testing 
does not allow any time for reflection over 
hard problems. When the Enlightenment 
philosopher Jean-Jacque Rousseau defined 
intelligence he placed the ability to stick 
with a problem, over time, on a par with the 
ability to think quickly. Reflection and per¬ 
sistence were the characteristics of Abraham 
Lincoln’s style of thought. 47 Should a testing 
paradigm penalize people like Rousseau and 
Lincoln? 

Creativity: Creativity is close to “ability 
to learn” on people’s lists of desirable men¬ 
tal traits. The commonsense definition of 
creativity is that it is a talent for produc¬ 
ing novel and admirable mental products. 
Biographical studies of people whom we all 
agree were creative (e.g., Picasso, Mahatma 
Gandhi, Einstein) invariably mention the 
time that they spent thinking about the 
issues before them. 48 This cannot be eval¬ 
uated in the conventional testing paradigm. 

There have been attempts to develop 
“creativity” tests that do fit into the nor¬ 
mal testing paradigm, by asking examinees 

45 Black and Wiliam, 1998. 

46 Hunt & Minstrell, 1996; Minstrell, 2001. 

47 Goodwin, 2005. 

48 Gardner, 1993b. 


to respond to unusual requests within a 
brief time. Two example questions are “List 
as many uses as possible for a brick” and 
“Within x minutes, write an essay with the 
title “The Octopus’s Sneakers.” Such tasks 
evaluate an ability to produce novelty on 
demand. But is this creativity? 

SOME REFLECTIONS ON THE CRITICISMS 
How serious are these limitations of the 
“Drop in from the Sky” paradigm? The 
answer depends upon what one wants to 
use the test for - identifying people who 
have a trait, or understanding the nature of 
the trait. 

Identifying people who have a trait does 
not necessarily require that the trait be eval¬ 
uated directly. Identification is possible if 
there is a statistical association between the 
trait(s) evaluated in the testing situation and 
the trait of interest. It has repeatedly been 
shown that the scores people achieve on 
conventional tests of cognitive ability pre¬ 
dict their ability to learn rapidly in both aca¬ 
demic and nonacademic settings. The rela¬ 
tion between SAT scores and graduation 
rates (Figure 2.4) is one example of such an 
illustration. 

The same argument applies to creativ¬ 
ity. People whom society considers to be 
creative seldom have low intelligence test 
scores. 49 People with very high test scores 
are more likely to produce creative prod¬ 
ucts than people with average scores or 
even above-average scores. David Lubin- 
ski, Camilla Benbow, and their colleagues 
at Vanderbilt University tracked the perfor¬ 
mance of people who, at age thirteen, took 
the SAT and achieved scores that put them 
in the top 1/10,000 (the 99.99 th percentile) of 
test takers, for that age group. By their thir¬ 
ties members of this group had garnered an 
inordinate number of patents, well-received 
literary works, and appointments to posi¬ 
tions where creativity was expected. 50 

Cognitive tests seldom evaluate reflectiv¬ 
ity and creativity directly, because of the 
inherent limits of the conventional testing 

49 Simonton, 1984. 

50 Lubinski et al., 2006. 
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paradigm. However, people who are good 
reflective or creative thinkers may be iden¬ 
tified by their scores on cognitive tests. The 
extent to which this is true is an issue to be 
decided by appropriate research, not by an 
attack upon or defense of the characteristics 
of the test. 

2.7.3. Public Policy Issues 

Tests used for personnel selection have pub¬ 
lic policy implications, for the public has a 
legitimate concern about how educational 
and industrial hiring decisions are made. 
Showing that a test is a statistically accurate 
predictor of academic or employment out¬ 
comes only partly answers these concerns, 
as the following example shows. 

In 2001 the President of the University 
of California, Richard Atkinson, proposed 
dropping the use of SAT-I from California’s 
admission program. 51 He was strongly criti¬ 
cal of the use of abstract reasoning tests, and 
especially the use of analogies. 

Atkinson pointed out that applicants 
could improve their scores by taking a spe¬ 
cial coaching course, which provides prac¬ 
tice in solving artificial, out-of-context ques¬ 
tions, such as analogical reasoning items. 
This concerned him for two reasons. One 
was that he thought that the prospective 
student’s time would be better spent study¬ 
ing academic topics. The other was that 
the coaching courses are fairly expensive, so 
using such tests as part of the admissions 
process gives an advantage to students from 
wealthier families, who could afford to have 
their children coached. 52 

51 Speech to the University of California Senate. 
Downloaded from www.senate.ucsb.edu/meetings/ 
townhall/sat/abstracts/AtkinsonSpeech.pdf, 

10 November 2008. 

52 Atkinson constructively proposed using the SAT- 
II, which is a slightly better predictor of academic 
performance at the University of California than 
the SAT-I. SAT-II scores are less highly correlated 
with family socioeconomic status (SES) than are 
SAT-I scores (Geiser & Studley, 2002]. We have 
come full circle. Atkinson's concern that the SAT- 
I favored the wealthy replaced Conant’s concern, 
seventy years earlier, that a test like the SAT-I was 
needed to avoid favoring the wealthy] 


Generalizing Atkinson’s concerns, when 
a test is used for personnel selection appli¬ 
cants will be motivated to acquire the skills 
the test evaluates. Decision makers who 
adopt a test have to consider whether they 
want to encourage such behavior. It may 
be that only some applicants (here, wealthy 
ones!] will have the opportunity to acquire 
the skills. Do the decision makers want to 
construct a system that gives some appli¬ 
cants an advantage over others? 

There is another objection that Atkinson 
could have raised, but did not. Tests that are 
used to make personnel decisions have to 
both be fair (i.e., accurate] and appear to be 
fair. The public has little sympathy for appli¬ 
cants to universities who are turned down 
because they cannot demonstrate prepared¬ 
ness in English or mathematics. Applicants 
get more sympathy if the denial is based on 
their inability to answer analogy questions 
such as Car is to Bus as Horse is to: Train, 
Airplane, Donkey, Stage Coach. Students do 
not have to answer such questions once they 
are in college, so why should they have to 
solve them in order to get in? 

Concerns such as these cannot be 
answered by appealing to statistics show¬ 
ing that test scores predict success. The tests 
must be perceived as fair in addition to hav¬ 
ing statistical validity. 

2.8. Summary 

There is no agency that certifies a cognitive 
examination as an intelligence test. Instead 
what we have is a collection of tests of 
cognitive abilities that, to varying degrees, 
assess different aspects of intelligence. This 
chapter has presented some representative 
ones. 

Although different tests have been devel¬ 
oped under different theoretical rationales, 
there is a surprising commonality of content 
across all tests. Methods for evaluating ver¬ 
bal intelligence, quantitative skills, and figu- 
ral reasoning appear over and over again. 

Some tests are given individually; others 
are given to groups of examinees. Beginning 
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around 1995 there was a movement toward 
presenting questions on computer screens. 
While this has advantages in terms of test 
administration, simply putting items on a 
computer does not change the nature of 
the psychological traits being evaluated; a 
vocabulary test is a vocabulary test. 

Reliance on out-of-context, “Drop in 
from the Sky” testing severely limits what 
can be evaluated. Within this paradigm it is 
not possible to evaluate certain intellectual 
traits, such as a capacity for learning, reflec¬ 
tion, or creativity, that certainly ought to be 
considered part of one’s intelligence. This is 
a matter of serious concern. 


When all is said and done, though, the 
tests are imperfect but not negligible pre¬ 
dictors of success in academic and industrial 
situations. Validity coefficients range from 
.30 to .85. 

Perfection is not a reasonable goal. An 
individual’s success is not solely determined 
by his or her personal characteristics; situa¬ 
tional constraints are often important. Nor 
can a test possibly evaluate all the personal 
traits that might be important in every sit¬ 
uation. The fact that there are limits on 
predictive ability does not imply that tests 
should be disregarded. Test scores do reflect 
intelligence. 



CHAPTER 3 


On Theory 


Then I would feel sorry for the good Lord. 
The theory is correct anyway. 

Albert Einstein [in reaction to a 
question from a graduate student 
about how he would have felt if 
the theory of general relativity had 
not received experimental 
support) 1 


3.1. Overview 

This chapter and the next discuss theories of 
intelligence. This chapter describes the role 
that theory plays in our efforts to under¬ 
stand intelligence. The discussion is neces¬ 
sary because there has sometimes been a lack 
of clarity about what theories of intelligence 
are and how they are to be used. Hopefully 
the discussion here will provide some prin¬ 
ciples to apply in analyzing specific theories, 
as is done in Chapter 4. 

The current chapter begins with a brief 
discussion of theory in general. It then 

1 Cited in Shapiro, 2006. 


moves to a discussion of some special prob¬ 
lems facing theories of intelligence. Finally, 
I contrast two different goals for theories in 
psychology, and consider why these goals 
may produce different theories. 

3.2. Scientific and Nonscientific 
Theories 

Outsiders sometimes see science as a process 
of collecting facts. Theories are seen as sec¬ 
ondary. Sometimes they are even derided. 
For example, proponents of intelligent design, 
the belief that life on Earth is the result 
of the actions of some all-powerful being, 
have insisted that schoolchildren be taught 
that the theory of evolution is “only a the¬ 
ory.” This is not meant as a compliment to 
Charles Darwin. Less obviously, but perhaps 
more damagingly because the practice is 
so widespread, science courses below the 
college/university level too often emphasize 
memorization of facts and formulae, while 
deemphasizing the theoretical explanations 
that are offered to explain why these facts 
are facts, and why the formulae work. 
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Theories are the glue that holds scientific 
observations together, for they summarize 
the chains of cause and effect that scientists 
use to understand how the world works. The 
theory of evolution, for instance, is a mar¬ 
velously lucid explanation of why life on 
Earth is so diverse. Models of physical phe¬ 
nomena can be, and are, used to design com¬ 
plicated aircraft that fly (and land safely) on 
the very first attempt. Theory lets us look 
back to explain why something happened 
the way it did, and to look forward to predict 
what will happen under various assumptions 
about conditions in the future. 

Theories are not unique to science. 
Many statements in philosophy, popular dis¬ 
course, and religion are theoretical state¬ 
ments. The biblical explanation for plagues 
in Egypt amounts to a theory; the ancient 
Egyptians acted in a way that displeased 
Jehovah, who reacted with plagues of frogs, 
locusts, and disease. Subsequently Jehovah 
wished that a certain favored people, the 
Israelites, be permitted to leave Egypt. The 
waters of the Red Sea rolled back to let 
the Israelites pass, and then rushed for¬ 
ward again, engulfing the Pharaoh’s army, 
in order that the Israelites might be safe. 
These are statements of cause and effect, and 
statements of cause and effect constitute a 
theory. 

Scientific theories differ from nonscien- 
tific theories in three ways. The first is that 
they must refer to variables that are measur¬ 
able in principle. The variables in a theory 
do not need to be measurable at the time 
that the theory is presented. The history of 
modern physics contains several instances in 
which theoreticians postulated the existence 
of subatomic particles before the technol¬ 
ogy to measure them had been developed. 
However, no variable can be admitted to a 
scientific theory if, in principle, that variable 
can never be measured. The theory of intelli¬ 
gent design is not a scientific theory because 
it postulates the existence of a being whose 
motives and powers influence our world, but 
whose nature cannot be known. 

The second distinction between scientific 
and nonscientific theories is that scientific 


theories must be interpretable by objective 
means. Inspired gurus are not permitted, 
and personal faith is never to be confused 
with evidence. This does not mean that 
anyone, without training, should be able to 
understand a scientific theory. The necessary 
material may be hard to master, but the rules 
of mastery have to be open and nonmystical. 
Consider the following two cases. 

Albert Einstein's theory of relativity was 
arguably the most important development 
in scientific theory in the twentieth century. 
When Einstein presented the theory of rel¬ 
ativity he had to present a chain of assump¬ 
tions and deductions that other, appro¬ 
priately trained, scientists could (and did) 
follow. The theory was accepted by physi¬ 
cists on the basis of their analysis of Ein¬ 
stein's ideas, not because they attributed any 
mystic properties to Einstein himself. Con¬ 
trast this to the decision of Saul of Tarsus - 
to accept Jesus as the Messiah and begin a 
new career as St. Paul, the Apostle to the 
Gentiles. St. Paul never claimed to have ana¬ 
lyzed Jesus’s philosophy; he claimed to have 
received a revelation from God. 

The third distinction between scientific 
and nonscientific theories is that scientific 
theories must account for data. A scientific 
theory is vacuous unless it implies that cer¬ 
tain observable events will happen - or, in 
more formal terms, that certain patterns will 
occur in data. There is no requirement that 
the observations be easy to make, or even 
that they be possible at the time that the the¬ 
ory is presented. Several of the predictions 
of Einstein's theory of relativity have only 
recently been evaluated, using technolo¬ 
gies that were not available in his lifetime. 
The requirement for prediction does mean 
that the theory must be stated with suffi¬ 
cient precision that, given the opportunity 
to observe, we can tell whether the predic¬ 
tion was confirmed or denied. Newton’s the¬ 
ory of motion predicted that large and small 
objects will fall at the same speed in a vac¬ 
uum, regardless of their mass. And they do. 
The prophecies of the Delphic Oracle, Nos¬ 
tradamus, and other seers are notorious for 
their ambiguity. 
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3.3. Choosing between Competing 
Theories 

Theories imply observations, not the other 
way around. When a theory correctly pre¬ 
dicts an observation, this is evidence in sup¬ 
port of the theory, but evidence can never 
prove that a theory is correct. However, if an 
observation is made that contradicts a pre¬ 
diction, then the theory has been discon- 
firmed. Accordingly, in the abstract, we do 
not ever determine the correct theory; all we 
do is to eliminate wrong ones, and continue 
to believe, with reservations, in those theo¬ 
ries that have not yet been disconfirmed. 

At this point statistics rears its ugly head. 
If theories always made exact predictions, 
and if we could be absolutely certain that 
our observations were accurate, then science 
would proceed in the way just described. In 
practice, that is not what happens. Predic¬ 
tions are often made in general terms, and 
measurements are never exact. Therefore, 
empirical results seldom completely rule out 
a theory. Instead, what we find is that the 
facts are more compatible with one the¬ 
ory than with another. This will modify our 
belief in the “winning” and “losing” theories, 
but not necessarily cause us to accept the 
winner and reject the loser. Indeed, it often 
happens that as more and more evidence 
accumulates our belief in all theories under 
consideration falls, and we have to develop 
new theories. 

One of the most hotly debated topics 
in intelligence, the Nature-Nurture debate, 
provides a good example. In its pure form, 
the “Nature” side of this debate asserts 
that intelligence is inherited genetically. 
The “Nurture” side states that intelligence 
is obtained through experience, with the 
corollary that early childhood experiences 
are particularly important. Let us look at the 
logic of the argument, saving the details for 
Chapters 8 and 9. 

Galton, who was firmly on the Nature 
side of the debate, observed that the emi¬ 
nent men and women of his own gener¬ 
ation were highly likely to have had emi¬ 
nent parents or grandparents. Eventually 
he identified families who produced people 


of eminence, generation after generation. 2 
Galton concluded that intelligence is largely 
inherited. 

The hypothesis that intelligence is inher¬ 
ited implies that there will be a statistical 
association between the eminence of fam¬ 
ilies from one generation to the next. The 
“null hypothesis” is that familial eminence 
varies randomly from one generation to the 
next. Galton found that the generation-to- 
generation associations of eminence were 
too strong to be accounted for by ran¬ 
dom variation. Accordingly, Galton had 
a strengthened belief in the hypothesis 
that intelligence (and hence eminence) is 
inherited. 

Galton was correct that the data should 
strengthen his belief in the Nature (heredi- 
tarian) hypothesis relative to the null hypoth¬ 
esis of no familial association. The position 
relative to the Nurture (environmental) 
hypothesis is different. Most people inherit 
both their genetic composition and their 
social environment from their parents. 
Therefore, both the Nature and Nurture 
hypotheses predict associations of familial 
eminence from generation to generation. 
The association that Galton found was evi¬ 
dence for the Nature or the Nurture hypoth¬ 
esis against the null hypothesis that fami¬ 
lies don't matter in achieving eminence, but 
it was not evidence that could be used to 
decide against either Nature or Nurture. 

As a side point, this example shows 
that, in general, simply testing a hypothe¬ 
sis against the null hypothesis is not likely to 
be very informative. 

Proponents of the Nurture hypothesis 
can point to studies that show that children 
raised in unfavorable environments perform 
poorly in school and on intelligence tests, 
at least during the early school years. On 
its face, this evidence seems to support 
the importance of the environment. How¬ 
ever, proponents of the Nature hypothe¬ 
sis can point out, correctly, that such chil¬ 
dren are not a random sample of children in 
their generation. Indeed, there is reason to 
believe that many of them are the children 

2 Galton, 1869. 
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Figure 3.1. Genetic and environmental influences determine the 
extent to which the intelligences of any two individuals, A and B, 
resemble each other. This, in turn, will determine the resemblance 
between test scores and other cognitive behaviors of the two people. 
We can only observe correlations between manifest variables - test 
scores and other cognitive behaviors. In addition, genetic theory can 
be used to specify the degree of resemblance between the genetic 
potentials of A and B. The other resemblances between latent 
variables must be inferred. 


of parents who were not terribly intelligent. 
The data does not discriminate between the 
hypotheses. 

By the middle of the twentieth century 
the debate had become more refined. Scien¬ 
tists were willing to acknowledge that both 
heredity and environment do have a role, 
and that the issue is not to look for a single 
cause for intelligence, but rather to deter¬ 
mine the relative influences of environment 
and heredity. At that point a much more 
sophisticated model of heredity began to 
be used. It is diagrammed in Figure 3.1. Its 
basic elements are (a) that there are both 
genetic and environmental components to 
intelligence and (b] that for any two indi¬ 
viduals, the extent to which they share the 
genetic component of intelligence will be 
determined by their genetic relation, and 
the extent to which they share environmen¬ 
tal components will be determined by their 
environmental relation. 

While it is not immediately obvious, dia¬ 
grams such as Figure 3.1 (and their associ¬ 
ated mathematics] can be used to generate 


expected correlations between test scores. 
The values of the correlations depend upon 
the strength of the connecting links. Some¬ 
times these strengths can be predicted from 
genetic theory. In a study of identical twins, 
for instance, the genetic link would have 
a strength of 1, indicating identical genetic 
constitutions. The link for fraternal twins 
would have a strength of .5, because frater¬ 
nal (dizygotic, DZ] twins share half their 
genetic inheritance. 

More generally, the contrast between 
theories, such as Nature versus Nurture, has 
been replaced in modern studies by a con¬ 
trast between models, which can be thought 
of as assumptions about the strengths of par¬ 
ticular links. The extreme Nature (hered¬ 
ity] model can be thought of as a spe¬ 
cific case of Figure 3.1 in which the links 
from Environment to Intelligence are absent 
(strength = o]. We could also investigate 
intermediate models - for example, a model 
in which the Environment-Intelligence link 
is required to have half the strength of the 
Genetics-Intelligence link. In all cases we 
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would come back to a basic test: which of 
the many possible models best reproduces 
the scores between observed variables, that 

is, the link between the four rectangular 
boxes (observable variables) in Figure 3.1? 

We return to this example in Chapter 8. 
For the moment, the important thing to real¬ 
ize is that the field has advanced from evalu¬ 
ating broad assertions about the importance 
of heredity or environment, to a far more 
sophisticated competition between mod¬ 
els whose parameters indicate the relative 
contributions of genetic and environmental 
causes of intelligence. 

3.4. System Thinking Complicates 
the Issue 

In Chapter 1, I argued that intelligence is not 
a directly observable trait; it is a latent trait 
that is inferred from observation of various 
manifest behaviors influenced by the trait. 
These include scores on IQ tests, school 
grades, and records of performance on the 
job. No one of these is entirely determined 
by intelligence, but each is influenced by 

it. This means that the definition of intel¬ 
ligence is, in a sense, something we make 
up, rather than being a directly observ¬ 
able trait. I also pointed out, in Chapter 1, 
that intelligence is embedded within a com¬ 
plex of other latent variables, including such 
things as genetic inheritance, socioeconomic 
status (SES), and educational opportunity. 
These variables have links between them - 
for instance, the link between the socioeco¬ 
nomic status of one's neighborhood and the 
quality of the schools. 

When variables are linked into compli¬ 
cated systems there are both direct and indi¬ 
rect effects. (These are sometimes called 
proximal and distal influences.) Untangling 
them can be a problem. Here is an example 
dealing with verbal intelligence. Once again, 
I simply assert the facts, leaving the details 
for a subsequent chapter. 3 

3 See Bouchard, 2009, for further analysis of this 

point. 


Reading is one of the most important 
skills children acquire in the early school 
years. Subsequently, reading skills are used 
to facilitate learning. The sooner and more 
completely the learning to read-reading to 
learn shift is made, the more the child can 
get out of the educational system. Learn¬ 
ing to read is associated with a child's fam¬ 
ily background. On the average, children 
from families in the middle and upper SES 
strata arrive at school better prepared to 
learn to read than children from a low SES 
background. This advantage is continued 
throughout the school years. Why? 

There is a genetic explanation. To the 
extent that SES is linked to genetic poten¬ 
tial, the children from middle-high SES fam¬ 
ilies may be better learners simply because 
they are biologically smarter. There is also 
an environmental explanation. The parents 
in middle-high SES families, on the average, 
spend much more time reading to their pre¬ 
school children than do parents in low SES 
families. Indeed, some of this reading behav¬ 
ior comes very close to teaching reading, as 
in reading books that tell children that “A is 
for apple” and so forth. 

We are dealing with a set of skills, embed¬ 
ded in a system. Furthermore, it is a system 
with feedback loops, for there is little doubt 
that as children learn to read their parents 
respond, and the manner of the response 
may vary with SES. The situation is depicted 
in Figure 3.2, which is only one of several 
possible models. It is forbiddingly complex. 

Ultimately, every behavior has to take 
its action via the child's genetic poten¬ 
tial. Parental genetic effects are of two 
sorts. Directly, parental genetics determine 
a child’s genetics, and the child’s genetic 
potential, interacting with the social envi¬ 
ronment, determines reading skills. Indi¬ 
rectly, parental genetics will partly produce 
parental child-rearing behaviors. These 
behaviors, influenced by the child's genetic 
potential, will produce the child’s behaviors, 
which may include reading or pre-reading 
skills. The child’s behavior, in turn, influ¬ 
ences future parental behaviors, and the 
cycle continues. And this is only part of 
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Figure 3.2. Influences on reading skills. Parental genetics will exert a 
direct effect on a child’s genetic potential, and also may influence 
the parent’s child-rearing practices (e.g., by increased sensitivity to 
a child’s mood). Parental child-rearing practices establish a social 
environment that, acting through a child’s genetic potential, 
produces a variety of behaviors, including reading skills. These 
behaviors, in turn, alter the parents’ behaviors toward the child. 


the system! I have not shown the effects 
of the parental social environment, which 
also influence parental behaviors, and many 
other influences besides. All the variables are 
enmeshed with each other. 

This is the point at which many peo¬ 
ple begin calling for “controlled experi¬ 
ments.” To continue the example of reading 
skills, it has been shown that extensive pre¬ 
school programs, ones that are far more 
intense (and expensive) than the typical 
“Head Start” program, will improve chil¬ 
dren’s cognitive skills during the early school 
years. There is even some evidence that 
the improvements last, in much weakened 
form, into adulthood. The studies in ques¬ 
tion involved comparisons between the per¬ 
formances of children who had participated 
in an intensive program compared to a 
matched group who had not, the classic 
experimental group versus control group 
study. The details will be given in Chapter 9. 
Such studies show that direct manipulation 
of a proximal variable, the child's social 
environment, can have an influence on the 
development of intelligence. However, this 
sort of study does not show that genetic 


inheritance can be ignored. The reason that 
it does not is that random assignment of 
individuals to experimental and control 
groups, without measurement of, say, 
parental intelligence, effectively destroys 
any possible correlation between genetic 
potential and the child’s performance, as an 
artifact of the experimental design. 

Suppose that we improve upon the 
intervention design, by measuring parental 
intelligence and, by appropriate statistical 
manipulations, estimating how much the 
observed improvement in the child’s per¬ 
formance can be attributed to the parent's 
intelligence, how much can be attributed 
to the intervention, and how much is due 
to some interaction between the two. Such 
designs have been approached, especially in 
the study of adopted children (Chapter 8). 
The resulting information is useful. How¬ 
ever, it does not tell us what the controlling 
variables are in the world as it is, for we 
would be studying a “world as it might be,” 
in which there are massively financed pre¬ 
school programs. 

The proper design for a study of intel¬ 
ligence depends upon what question we 
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are trying to answer. It turns out that the 
proper theoretical model may also depend 
upon the question at hand. Intelligence, as 
a concept, can be introduced into different 
systems. Some questions deal with control 
issues in social systems - either how intelli¬ 
gence might be controlled within social sys¬ 
tems, or how changing intelligence might 
alter those systems. Studies of the role of 
intelligence in education, personnel selec¬ 
tion, and industrial operations focus on 
control issues. Other studies deal with reduc¬ 
tion issues in biological systems - how intel¬ 
ligence can be derived from some more 
basic phenomena. Studies of the relation 
between intelligence and the brain, genetic 
influences on intelligence, and studies of the 
relation between physiological states and 
intelligence are reductionist. We want to 
conceptualize intelligence in a way that fits 
into the system being studied. 

3.5. Intelligence as a Construct 
in Social Systems 

We need theories and models so that we 
can think about how things work. We use 
them all the time to understand (not always 
correctly] how things like electrical circuits 
and mechanical pulley-and-lever systems 
function. 4 On a grander scale, economists 
argue (not always correctly] about the right 
model for the world economy, while epi¬ 
demiologists use models (hopefully cor¬ 
rectly) to forecast the seriousness of an out¬ 
break of influenza. 

Why should models be so ubiquitous? 
David Geary 5 has argued that humans have 
evolved a capacity to construct models of 
the world in order to satisfy an inherited 
drive to exert control over their environ¬ 
ment. In order to accomplish this goal the 
various components of the model have to fit 
together in a way that we can think about. 
This means that we may have to conceptu¬ 

4 Centner & Gentner, 1983; Hegarty, Just, 8t Morri¬ 
son, 1988. 

5 Geary, 2005, 2007a. 


alize the same thing in somewhat different 
ways, depending upon the world in which 
the thing is embedded. That is certainly true 
of intelligence. 

Here is a case study. 

During the 1970s two Stanford Univer¬ 
sity educational psychologists, Lee Cron- 
bach and Richard Snow, studied aptitude x 
treatment interactions in education/ They 
concluded that classroom environments that 
encourage active exploration and experi¬ 
mentation benefit students with high cog¬ 
nitive abilities, while students with lower 
cognitive abilities may do better in teacher- 
structured learning environments. Their 
findings have been replicated in situations 
as different as college students learning 
statistics 7 and elementary school children 
learning to read. 8 

Conceptually, Cronbach and Snow 
developed a model of the relationship 
between psychometrically defined general 
intelligence and educational practices. The 
variables they studied, psychometric scores 
and teaching methods, were measures and 
actions available to teachers. Statements 
about, say, the behavioral implications of 
the level of neural activation of children’s 
forebrains during functional magnetic reso¬ 
nance imaging (fMRI) would be of little use 
in the classroom. 

If we were to embed the cognitive behav¬ 
iors that we call “intelligence" into another 
system, we might want quite a different 
conceptualization. Imagine a (hypothetical) 
study in which the investigator is interested 
in the use of drugs to ameliorate cognitive 
deterioration during aging. In that situation 
relating intelligence to the level of neural 
activation in a patient’s forebrain might be 
a reasonable thing to do. 

If a scientific model is to be used to inform 
decision makers — that is, to control a sys¬ 
tem - then the theory must be stated in 
terms of variables that the decision maker 
can measure and control. 

6 Cronbach and Snow, 1977; Snow, 1982. 

7 Shute et al., 1996. 

8 Freebody & Tirre, 1985. 
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Lifetime experiences 



Genetic inheritance 


Figure 3.3. The levels at which theories of intelligence may be 
stated. Rectangles indicate physically measurable variables; 
ellipsoids indicate hypothetical variables. 


3.6. Reductionism 

Using scientific models to control variables 
is something of an engineer’s view of the sci¬ 
entific enterprise. It is valid, it is important, 
and it is not the only use of theory in science. 

The other use is pure understanding. 
Again the system concept is useful. A system 
is understood when it is completely closed, 
so that its behavior can be understood 
entirely from the actions of its elements. 
This produces an ordering of scientific top¬ 
ics: the properties of atoms are derived 
from the properties of subatomic particles 
and forces; the properties of chemical com¬ 
pounds are derived from their atomic struc¬ 
ture; the properties of biological systems 
are derived from the chemical and physi¬ 
cal properties of their components, and so 
on. The sequence is called reductionism. 

Reductionist analyses have important 
implications for the study of intelligence. 
David Wechsler, the developer of the 
Wechsler tests, pointed out that intelligence 
is revealed by behavior. 9 Behaviors that 
we see as exercises of intelligence (or lack 
thereof] must be under the control of the 
brain. What is the alternative, the influence 

9 Wechsler, 1975. 


of good and bad angels? Therefore, theories 
of intelligence that relate individual differ¬ 
ences in cognitive power to individual differ¬ 
ences in brain action are important as steps 
in reductionism. It is not clear that such the¬ 
ories would be manageable, let alone useful, 
in understanding how individual differences 
in intelligence influence academic learning 
or on-the-job performance. Models of these 
phenomena might well take as their primi¬ 
tive terms concepts such as general intelli¬ 
gence, verbal comprehension skill, and spa¬ 
tial orientation ability without any concern 
about how these abilities are produced by 
the brain - just as Newtonian mechanics 
accepts gravitational force as a primitive fact 
without further explanation. 

Figure 3.3 diagrams the reductionist view 
of research on intelligence. All performance 
depends on the brain. At any moment the 
brain is the product of both a person’s 
genetic inheritance and his or her environ¬ 
ment. "Environment” has to be interpreted 
in the broadest manner. It does not just refer 
to obvious physical factors, such as nutri¬ 
tion or injury. It also refers to social factors, 
including education. Why? Because learn¬ 
ing produces physical changes in the brain. 
This includes the storage of information, 
but there is more to it than that. The same 














7 2 


HUMAN INTELLIGENCE 


external task may be processed by the brain 
in different ways before and after learning. 

While the brain is ultimately the only 
thing that actually does something, it is often 
useful to deal with the results of brain action 
at the functional level rather than at the 
level of brain action itself. To illustrate, con¬ 
sider how a person recognizes his or her 
country's flag. In principle, it would be pos¬ 
sible to describe flag recognition in terms 
of neural action in the occipital, temporal, 
and frontal lobes, but in practice it is sim¬ 
pler to say that visual pattern-detection and 
memory-recognition processes are involved. 
Individual differences in performance on the 
complex task, flag recognition, can then be 
related to individual differences in vision 
and memory. Failure to recognize the flag 
could be due to a distortion of either visual- 
recognition or memory-retrieval processes. 
Both processes are theoretical entities; there 
is a sense in which they do not exist. That 
is why information-processing functions are 
shown in ellipses in Figure 3.3, while phys¬ 
ically observable brain and behavioral pro¬ 
cesses are shown in rectangles. Nevertheless, 
there is a place for descriptions of theoretical 
processes. 

The top of the figure specifies two types 
of physically observable behaviors - per¬ 
formance on psychological tests and per¬ 
formance in life activities. In the physi¬ 
cal sense both of these are produced by 
the brain. Conceptually, though, we can 
think of both behavior in testing situa¬ 
tions and behavior everywhere else as being 
produced by information-processing capa¬ 
bilities combined with previously acquired 
knowledge. 

The double-headed arrow between 
behaviors on tests and behaviors elsewhere 
indicates that the two different types of 
behavior are correlated, but that there is 
no causal link between them. Psychologi¬ 
cal test scores do not, in themselves, cause 
anything. (The social interpretation of a test 
score may be causal, but that is another 
issue.) What test scores can do is to indicate 
possession of knowledge and information¬ 
processing capacities that control behavior 
in life outside of the testing session. 


The next three sections illustrate these 
principles by giving brief descriptions of 
studies that deal with issues at each of the 
three levels of explanation. 

3.7. A Study at the Psychometric Level 

Psychometric theories deal with the dimen¬ 
sions of individual ability that are believed 
to underlie performance on intelligence and 
personality tests. The goal is to simplify the 
tremendous amount of data contained in 
batteries of tests of cognition or personality, 
such as the various battery tests described 
in Chapter 2, by showing that the data from 
test batteries can be understood in terms of 
individual variations along a small number of 
dimensions of mental ability, called factors. 
Some details of the technique will be 
explained in Chapter 4. Here we concen¬ 
trate on the results, using, as an example, 
a model that will be featured prominently 
throughout this book. 

Two University of Minnesota researchers, 
Wendy Johnson and Thomas Bouchard, had 
available data from a study in which 436 
adult participants had taken 42 different 
tests. 10 There were over 16,000 numbers in 
the data set, obviously more than a per¬ 
son could think about coherently. However, 
Johnson and Bouchard found that a large 
part of the individual variation on the forty- 
two tests could be captured by individual 
variation on just four factors: general intelli¬ 
gence, verbal skill, perceptual skill, and the 
ability to manipulate images “in the mind's 
eye,” that is, to imagine how something 
seen from one perspective might look from 
a different perspective. They referred to 
their model as a general intelligence-verbal- 
perceptual-rotation (g-VPR) model. 

Johnson and Bouchard's work is typical 
of hundreds, if not thousands, of psychome¬ 
tric investigations showing that variation on 
many tests can be summarized by variation 
along a small number of basic dimensions. 
Psychometric studies also provide measure¬ 
ments that are useful in understanding and 

10 Johnson & Bouchard, 2005a. 
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controlling social systems, most notably 
in personnel selection. The SAT, AFQT, 
and WPC tests described in Chapter 2 are 
derived from psychometric studies, as are 
almost all college entrance and industrial 
screening tests. 

Psychometric models are less useful when 
the task is to explain individual differences 
on specific cognitive tasks. To illustrate why, 
we take a look at an act of verbal compre¬ 
hension. 

Here is an excerpt from a newspaper 
editorial: 

The citizens of Washington and the nation 
are spending millions and millions of dol¬ 
lars on efforts to protect salmon and save 
them and the ecosystem they support from 
the threat of extinction. 

These goals are being hurt by California 
sea lions that have outsmarted the system 
and now wait patiently for the fish to be cor¬ 
ralled into the ladders at Columbia River 
dams. 

Bellingham (Washington) Herald, 
April 11, 200J 

From the psychometric perspective, in order 
to understand the editorial you have to have 
a certain level of verbal ability - in order to 
understand the passage at all - and a certain 
level of general reasoning ability to realize 
that the problem is that sea lions are eating 
too many fish to sustain the fisheries. This 
fact is not stated directly. It has to be inferred 
from an analysis of the text plus some gen¬ 
eral knowledge about sea lions, predators, 
and prey. 

Psychometric research could be used to 
calculate the probability that a person with 
verbal ability level x and general reasoning 
ability level y would have probability z of 
understanding the passage. Something like 
this sort of analysis is used to design instruc¬ 
tion manuals, where the prose has to match 
the reading and reasoning skills of the peo¬ 
ple who will read the manual. Psychometric 
analyses do not tell you what readers have 
to do in order to comprehend a manual. For 
that sort of analysis we must move to the 
information-processing level. 


3.8. A Study at the 
Information-Processing Level 

An examination of what has to be done to 
understand the editorial reveals that, with¬ 
out effort, we do quite a bit of informa¬ 
tion processing. Retrieving word meanings 
is only part of the task. It is complicated 
enough, for both semantic and syntactical 
information must be retrieved, and ambi¬ 
guities must be resolved. Does the word 
Washington in the editorial refer to the city, 
the state, or the president? Then even heav¬ 
ier information-processing demands appear 
during the analysis of sentences and of the 
text as a whole. 

The comprehension process has to oper¬ 
ate on words as they appear. However, the 
meaning of a word, in context, often can¬ 
not be determined until following words are 
received. The initial the in the editorial has 
no meaning until it is attached to citizens. The 
citizens of Washington and the nation specifies 
which citizens and resolves the ambiguity 
of Washington. Next comes are. It appears 
that a descriptor is going to follow, that the 
collective just described (The citizens . . . ] is 
going to be equated with something, as in 
The citizens of Washington and the nation 
are taxpayers. This interpretation is dashed 
when spending occurs, for now are must be 
interpreted as an auxiliary verb. Are spending 
is not a complete statement; the comprehen- 
der must find out what is being spent. The 
words millions and millions cannot be inter¬ 
preted until dollars is encountered. 

As this analysis shows, a language com- 
prehender must have a cache for holding 
information to be integrated with informa¬ 
tion not yet received. In the current jargon 
of psychology, the cache is called working 
memory. It is an example of an information¬ 
processing concept, in the sense that it refers 
to an abstract ability to manipulate pieces of 
information in the mind, without specifying 
what these pieces of information mean in 
the world outside the mind. 

The result of the construction is a text 
model summarizing what the text says 
explicitly. The text model alone is not 
enough! In order to reach full understanding 


74 


HUMAN INTELLIGENCE 


the comprehender must construct a situation 
model of what the text means in the context 
in which it is presented. 11 This means going 
beyond the text to understanding things that 
are only referred to obliquely. The editorial 
is about sea lions eating salmon, but that is 
not stated explicitly] 

Clearly working memory is central to the 
comprehension process. It follows that indi¬ 
vidual differences in the capacity of working 
memory ought to be related to individual 
differences in verbal comprehension. And 
they are. Marcel Just and Patricia Carpenter, 
professors at Carnegie-Mellon University, 
together with their colleagues, have con¬ 
ducted a substantial series of experiments 
investigating the link. 12 

In order to measure working memory Just 
and Carpenter used a procedure called the 
memory span task d 5 In a memory span task 
an examinee is asked to read a fairly compli¬ 
cated sentence aloud, and then remember 
the last word. Another, unrelated sentence is 
then presented. After k sentences the exam¬ 
inee is asked to recall the words. Just and 
Carpenter offer the following example, with 
k = 2: 

Read: 

When at last his eyes opened, there was 
no gleam of triumph, no shade of anger. 
The taxi turned up Michigan Avenue 
where they had a clear view of the lake. 

Recall: 

The examinee should then recall the last 
two words: anger, lake. 

A person’s memory span measure is 
defined as the highest number of sentences 
he/she can read and still recall all the ending 
words. The argument is that working mem¬ 
ory is taxed by having to hold the ending 
words of previous sentences while the cur¬ 
rent sentence is processed. Thus the mem¬ 
ory span is a measure of the capacity of 

11 Kintsch, 1998. 

12 Just & Carpenter, 1992; Just, Carpenter, & Keller, 
1996. 

13 Daneman & Carpenter, 1980, 1983; Daneman & 
Merikle, 1996. 


working memory. College students’ mem¬ 
ory spans vary from 2 to 5.5, averaged over 
participants. 14 

The Carnegie group found that memory 
spans are associated with the ability to com¬ 
prehend text. 15 One of the comprehension 
examples they used was the ability to detect 
ambiguities, as in 

The experienced soldiers warned about the 

dangers.. . . 

which could mean either that the soldiers 
warned some to-be-specified people or that 
the soldiers themselves had been warned by 
some unspecified person, as in 

The experienced soldiers warned about the 

dangers conducted the midnight raid. 

This contrasts with a phrase like 

The experienced soldiers spoke about the 

dangers.. . . 

where the soldiers are unambiguously doing 
the speaking. 

Just and Carpenter argued that only peo¬ 
ple with high memory span could afford 
to keep two interpretations (alternative text 
models) in mind, and that this allowed them 
to achieve a better understanding of com¬ 
plicated texts than could be achieved by 
people with low memory span, and hence 
low working-memory capacities. They then 
conducted a series of experiments showing 
that their conclusion was correct. For exam¬ 
ple, high-span people read sentences faster 
than low-span people. However, high-span 
people read sentences containing ambigu¬ 
ous verbs, like warned in the illustration, 
more slowly than they read sentences with 
unambiguous verbs. Low-span people read 
both types of sentence at about the same 
rate. Evidently only the high-span individu¬ 
als noticed the ambiguity and carried both 
meanings forward until the ambiguity was 
resolved. 

This work is clearly reductionist. A psy¬ 
chometric construct, individual differences 
in verbal comprehension, was related to an 

14 Just and Carpenter, 1992. 

15 See, especially, MacDonald, Carpenter, & Just, 1992. 
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information-processing concept, working- 
memory capacity. 

3.9. Studies at the Brain Level 

Explanations of behavior that use infor¬ 
mation-processing concepts such as 
working-memory capacity or retrieval 
of word meaning are less abstract than 
explanations based on terms like verbal 
ability or general reasoning ability, but 
they are still abstract. There has always 
been considerable interest in explaining 
individual differences in behavior in terms 
of individual differences in brain structures 
and processes. Historically, this was done 
by drawing inferences about normal behav¬ 
ior by analogy to extreme differences in 
behavior associated with damage to the 
brain, as in J. P. Das’s development of 
tests for normal children based on Luria’s 
observations of the results of brain injury. 
Today’s advanced medical technology has 
made it possible to extend the approach, 
because we can take a limited look at 
the brain in vivo , as people are thinking. 
Chapter 7 reviews the current state of 
knowledge in this rapidly evolving field. 
Here a few quick remarks are in order, 
to show how models based on discoveries 
in the brain sciences fit into theories of 
intelligence. 

The new technologies fall into three 
broad classes. A variety of imaging tech¬ 
niques have been developed that permit 
direct looks at structures in the living brain, 
at a much finer level of detail than provided 
by the X-ray imaging technique developed 
early in the twentieth century. (The X-ray 
method itself has become far more accu¬ 
rate.) Other technologies, and especially a 
technique called functional magnetic reso¬ 
nance imaging (fMRI), provide information 
on metabolism, and hence energy use, in 
different regions of the brain. These tech¬ 
niques have good spatial resolution, but, 
because metabolic processes themselves are 
slow, and hence reflect the brain’s use of 
energy over a period of seconds, temporal 
resolution is poor. Techniques for recording 


the electrical traces of neural activity, such 
as the electroencephalogram (EEG), were 
developed in the mid twentieth century, 
but today’s capability for recording electrical 
events is greatly enhanced over what it was 
as recently as the 1990s. Electrical recording 
techniques can detect events at the millisec¬ 
ond level but have relatively poor spatial res¬ 
olution. 

To give an idea of the current state of 
affairs, here are some results related to lan¬ 
guage processing. 

Just and Carpenter’s Carnegie-Mellon 
group 16 participants read sentences of vary¬ 
ing syntactic complexity, such as 

The reporter attacked the senator and 
admitted the error. 

The reporter that attacked the senator 
admitted the error. 

The reporter that the senator attacked 
admitted the error. 

while fMRI images were being taken of their 
brains. Metabolic activity increased in two 
areas previously known to be involved in 
language comprehension - a region in the 
left frontal cortex known as Broca’s area, 
and a nearby region in the left temporal 
region known as Wernicke’s area. As sen¬ 
tences increased in syntactic complexity the 
number of areas of the brain showing acti¬ 
vation increased. Further studies of this sort, 
using linguistic and nonlinguistic stimuli, 
have shown that as the difficulty of a task 
increases, metabolism increases in the areas 
of the brain required to solve the task. How¬ 
ever, studies of individual differences have 
shown that high performers (e.g., people 
with high verbal intelligence test scores) 
show lower metabolic rates than low per¬ 
formers when attacking the same task. 17 Evi¬ 
dently the harder the problem, the harder 
the brain has to work, but the more intel¬ 
ligent the person, the less the brain has 
to work. Intelligent people have efficient 
brains. 

16 Just, Carpenter, Keller, Eddy, et al., 1996. 

17 See Neubauer, 2000, and Vernon, Wicket, Bazana, 
& Stelmack, 2000, for reviews. 
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3.10. A Critique of the Levels 
Approach 

Do we need three levels of theories? 

There is a compelling argument for the 
psychometric level. The tests are needed, 
solely because of a requirement for the effi¬ 
cient measurement of intelligence in person¬ 
nel selection and utilization, ranging from 
university entrance examinations to indus¬ 
trial employment programs. Since the tests 
are needed, some orderly way of thinking 
about the results is also required. Psycho¬ 
metric theories provide a way of doing this. 

There is an equally compelling, but quite 
different, argument for brain-level theories 
of intelligence. They are an important step in 
understanding, for they link complex behav¬ 
ioral observations, such as verbal compre¬ 
hension, to events inside the brain. Under¬ 
standing these links is an essential part of 
the reductionist program of identifying the 
causal links between individual differences 
in brain processes and structures and indi¬ 
vidual differences in cognition. 

Do we need a theory of intelligence at 
the information-processing level, interven¬ 
ing between models of individual differences 
in the brain and individual differences in test 
performance and in daily life? I believe that 
we do. 

A complete information-processing 
theory of intelligence would provide (a] an 
understanding of the information¬ 
processing mechanisms that underlie 
thought, (b) measurements of individual 
differences in the power of these mecha¬ 
nisms, and (c) measurements of individual 
differences in the knowledge bases that are 
manipulated by the information-processing 
mechanisms. I see this sort of theory as 
a substantial advance on a theory that 
explains the way intelligence is expressed 
outside the laboratory in terms of corre¬ 
lations between measures of real-world 
cognition and either test scores or statistical 
summaries of them. I do not believe that 
a model of intelligence based on the brain 
can fulfill this role, because in practice we 
are often interested in individual differences 
in a capacity for processing information, 


not individual differences in activation 
levels in this or that brain region. For 
instance, in assigning air traffic controllers it 
is important to know how many different 
aircraft a given controller can handle at one 
time - and there are individual differences 
in this ability. The requirement is stated 
in terms of information-processing behav¬ 
iors. Psychological models to address this 
sort of issue must be stated in terms of 
information-processing capabilities. 

This argument is not universally ac¬ 
cepted. Ian Deary, a professor at the Univer¬ 
sity of Edinburgh, has made a cogent argu¬ 
ment against information-processing models 
of intelligence. 18 

Deary points out two weaknesses of 
current information-processing approaches. 
The first is that there is no guarantee that 
the tasks used to measure information¬ 
processing functions, such as the span task, 
measure what they say they do, because 
there is no agreed-upon theory of cog¬ 
nition at the information-processing level, 
and therefore no unassailable measure of 
the postulated functions. The second weak¬ 
ness is that many of the alleged elementary 
information-processing tasks require some 
of the same behaviors as the psychometric 
tasks. He does not see an advantage in restat¬ 
ing a psychometric theory in information¬ 
processing terms. 

Deary proposes that research on intelli¬ 
gence ought to try to link psychometric test 
scores directly to physiological measures. 19 
Deary says that his view of the current state 
of theories of intelligence conforms to “the 
Scottish stereotype of dourness; stern and 
unrelenting.” 20 I am somewhat more opti¬ 
mistic about the need to have information¬ 
processing levels of explanation, and a bit 
more dour, stern, and unrelenting concern¬ 
ing the utility of studying a system consist¬ 
ing solely of brain-based and psychometric 
variables. 

The brain does not provide a place for a 
complicated trait like intelligence; it provides 

18 Deary, 2000, Chapters 5-8. 

19 Deary, 2000, Chaper 9. 

20 Deary, 2000, p. 329. 


ON THEORY 


77 


a kit of tools, the information-processing 
functions, that, when guided by knowledge, 
can be assembled into intelligent behav¬ 
ior. The way in which the brain is orga¬ 
nized, at the functional level, defines the 
kit of tools and tells us how they must be 
connected. Concepts like working memory, 
attention, and speed of information pro¬ 
cessing are useful ways of talking about 
the functional underpinnings of intellec¬ 
tual behavior, just as concepts like accel¬ 
eration, deceleration, gasoline consumption, 
and turning ratio are useful ways of talking 
about the functional underpinnings of auto¬ 
mobile capabilities. 

The problem with going directly from 
brain-based variables to psychometric vari¬ 
ables is that the steps are simply too com¬ 
plex to comprehend. While we now know a 
great deal about the involvement of differ¬ 
ent regions of the brain in different types of 
cognitive activity, we know much less about 
how networks of neurons create memories 
and conduct inferential reasoning, although 
we have interesting mathematical models 
of how neural networks might accomplish 
these functions. 21 Advances in the cognitive 
neurosciences will tell us what the neural 
bases of working memory, long-term mem¬ 
ory storage, attention, and similar functions 
are. We will still need to understand how 
these functions are assembled to produce 
intelligent behavior. And we need to allow 
for the possibility that different people will 
assemble them in different ways. 

I have somewhat more sympathy for 
Deary’s “dour and unrelenting” view of cur¬ 
rent progress. He questions whether or not 
the tasks that have so far been used to assess 
information-processing concepts actually do 
this, and do this at a level that is more 
elementary than the behaviors required on 
a psychometric test itself. This issue can 
be resolved only on a case-by-case basis. It 
is true that there are disagreements about 
what the basic information-processing func¬ 
tions of the mind are, and about how each 

21 This research is done under the general topic of 

connectionist modeling. See E. Hunt (2002, 2007) 

for discussions. 


function should be measured. However, the 
more research we do, the more psycholo¬ 
gists seem to be converging on an agreed- 
upon information-processing model, includ¬ 
ing agreement on how its properties should 
be measured and how these measurements 
should be coordinated with information 
about how the brain functions. Thus there is 
no qualitative difference between my view 
and Deary’s. It is just that we vary in opti¬ 
mism. Deary puts his eye squarely on the 
empty portion of the glass, albeit admitting 
that there is some liquid in it. My eye is more 
focused on the liquid, but I certainly admit 
that the glass is not full. 

3.11. Summary 

Theories are needed for two reasons - in 
order to develop models for predicting and 
controlling important systems and in order 
to reduce complex phenomena to more 
basic levels. Scientific theories are further 
distinguished by a commitment to empiri¬ 
cism, objectivity, and empirical verification. 

Theories of intelligence can be stated at 
the psychometric, information-processing, 
and biological levels. All are needed, but 
for different purposes. Psychometric the¬ 
ories provide concise summaries of the 
dimensions of variation in cognitive com¬ 
petence. They also play an important role 
in personnel selection systems. Information¬ 
processing theories provide a functional 
description of why the variables at the psy¬ 
chometric level correlate in the way they 
do. This includes coordination with mea¬ 
sures of brain processes. Biological theories 
relate intelligent behavior to individual dif¬ 
ferences in brain structure, brain processes, 
genetic inheritance, or any combination of 
these three. 

Psychometric-level models and biological 
level models relate observables to observ¬ 
ables. Information-processing models are, in 
a sense, fictions. They are extremely useful 
fictions. Indeed, they may be necessary. In 
addition to being the appropriate level at 
which to predict some aspects of socially 
relevant behavior, information-processing 
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theories provide a vital link between the 
psychometric and biological levels. 

Cogent criticisms can be directed at all 
current models at all levels. Criticizing cur¬ 
rent progress is not the same as denying 
the need for models. We never know when 
we have the “correct” model, for as we 
push our observations further we are almost 


certain either to change our current ideas 
or to decide that they are accurate within 
a particular set of observations, but need to 
be accommodated to allow consideration of 
other observations. This has been the nature 
of progress in every other science; there is no 
reason to expect it to be any different in the 
scientific study of intelligence. 


CHAPTER 4 


Psychometric Theories 


The definition of an ability arises from 
systematic observations of individual 
differences in performance on a defined set 
of tasks. These observations constitute the 
empirical basis of ability measurement. 
They require no assumptions or exact 
knowledge about neurophysiological 
functions that might be responsible for 
performance levels ; although specialists 
outside the strict field of psychometrics 
may find it possible and useful to seek such 
knowledge. 

John B. Carroll, 1993, p. 23 

Psychometric models of intelligence assume 
that variations in cognitive performance 
across different situations can be summa¬ 
rized by individual differences in a fairly 
small number of basic cognitive dimensions, 
such as general reasoning, verbal facility, 
and visual-spatial reasoning. In general, the 
effort to find these dimensions has been 
quite successful. In the last analysis, though, 
psychometric models are summaries of test 
scores, and are relevant to cognition in gen¬ 
eral to the extent that test performance 


itself is relevant to individual differences in 
thinking, as revealed in daily life. In Chap¬ 
ter 1, I argued that test performance is rele¬ 
vant, as an imperfect sample of the cognitive 
skills we use on a daily basis. This chapter 
concentrates on theories of cognition within 
that imperfect sample. The following chap¬ 
ter discusses expansion of these theories. 

4.1. What Are Psychometric Models? 

Whenever we summarize scores on differ¬ 
ent pieces of performance we are implic¬ 
itly assuming that there is some common 
thread underlying each piece. This is a very 
common assumption. The SAT math and 
verbal scores are summaries of the scores 
on subtests, and these summaries are them¬ 
selves amalgamated to produce an overall 
score. Similar summaries are used by the 
US Armed Services (summarizing sub tests 
on the ASVAB to produce the AFQT), 
and on what are avowedly intelligence tests, 
like the WAIS (four index scores] and 
the Woodcock-Johnson (WJ] tests (fluid 
and crystallized intelligence, and a number 
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of other scales.) Psychometric theories are 
basically expansions on this idea. Initially 
tests are chosen to represent the variety of 
cognitive performances that the investigator 
is interested in. Statistical analyses, which 
are reviewed briefly in the next section, are 
used to discover the latent traits underlying 
performance on ostensibly different tests. In 
the case of research programs the analysis 
often stops there. In the case of continu¬ 
ing programs of test development, such as 
the SAT and WAIS programs, the analyses 
of present tests are used to create improved 
revisions of the original test. 

The next section is a brief description of 
factor analysis, a key statistical tool in test 
development. Following sections discuss the 
theories that have been developed using the 
tool. 

4.2. Factor Analysis 

This section is intended to develop an intu¬ 
itive feeling for the method. It certainly is 
not a substitute for a text on modern factor 
analyts. 

A psychometric model takes as the data 
to be explained the covariances [or cor¬ 
relations, in the case of standard scoring) 
among K observable measures. The expla¬ 
nation offered consists of a statement of 
M underlying hypothetical variables, called 
latent traits. The number of latent traits, 
M, should be substantially smaller than the 
number of tests, K. Further restrictions may 
also be implied. 

Now let us unpack these mathematical 
ideas with a simple example. 

4.2.1. The Introductory Example 

Suppose that high school students were 
given the following tests: 

Vocabulary: A test of their ability to 
define words. 

Sentence comprehension : A test of their 
ability to define the meaning of sen¬ 
tences. 

Numerical computation : A test of their 
ability to do computations involving 


the basic operations of addition, sub¬ 
traction, multiplication, and division. 

Algebra: A test of their ability to solve 
algebraic problems. 

Word problems: A test of their ability 
to solve mathematical problems pre¬ 
sented in words. (According to one 
cartoonist, the library in Hell contains 
nothing but books of word problems.) 

Our intuitions, which in this case would 
be supported by a great deal of research, 
are that in a typical class there would be a 
strong tendency for the students who did 
well on one test to do well on the others. 
Conversely, students who did poorly on one 
test would tend to do poorly on the oth¬ 
ers. This condition, in which all tests in a 
battery are positively correlated, is called 
positive manifold. 

Positive manifold is a matter of degree, 
for correlations between tests are seldom 
perfect. The best student on the mathe¬ 
matics tests might not be the best student 
on the language-related tests, but he or she 
would probably be toward the top. The 
same thing would be true of the worst stu¬ 
dent on the mathematics tests - probably 
toward the bottom on the verbal tests but 
not necessarily right at the bottom. To sum¬ 
marize, performance on one test would be 
a useful predictor of performance on the 
other tests, but the prediction would not be 
perfect. 

The hypothetical example reflects the 
reality of intelligence testing. Virtually all 
tests of cognitive performance exhibit posi¬ 
tive manifold to some degree, and they often 
do so quite strongly. 

In the hypothetical case, it seems reason¬ 
able to suppose that there are two under¬ 
lying abilities involved, verbal and math¬ 
ematical ability. If the two abilities are 
themselves correlated, it might be appropri¬ 
ate to talk about a general cognitive ability. 
But how are we to find out, in an objective 
manner, whether we can reduce the five¬ 
dimensional space defined by the observable 
test scores to just two or three dimensions 
based on underlying traits? Factor analysis 
answers this question. 
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Figure 4.1. A hypothetical example in which five tests are given to a 
group of high school students. Each test can be thought of as an 
arrow that points outward from the origin (which represents zeroes 
on all tests). The more highly correlated the tests, the more their 
vectors will point in the same direction. In this example the 
mathematical and verbal tests form separate groups, and the word 
problems test, which contains both mathematical and verbal 
components, lies in-between the groups. 


Factor analysis comes in two major vari¬ 
eties - exploratory factor analysis and con¬ 
firmatory factor analysis - and several sub- 
varieties. First we consider the exploratory 
technique. 

4.2.2. Exploratory Factor Analysis 

Here is a geometric way of thinking about 
the five tests just introduced. Imagine that 
each test is an arrow, pointing somewhere 
in a space defining mental abilities, and 
that each person's score is a point some¬ 
where along that arrow. If there is no cor¬ 
relation between any pair of tests, each of 
these arrows will be placed at right angles 
to [be orthogonal to) all of the others, so that 
the arrows would point along the axes of a 
five-dimensional space, and no further sim¬ 
plification would be possible. If the tests 
are positively correlated, the arrows will 
tend to point in the same general direction. 
The higher the correlation between any two 
tests, the more closely the arrows will point 
in the same direction. In the extreme, if the 


correlation between two tests is one, the cor¬ 
responding arrows will point in exactly the 
same direction. 

The situation is shown in Figure 4.1, 
which is, of necessity, a two-dimensional 
projection of the directions of the arrows 
representing the tests. Exploratory factor 
analysis finds a line that, in a precisely 
defined mathematical sense (which need not 
concern us, other than knowing that the def¬ 
inition exists), best summarizes the com¬ 
mon direction in which the tests are point¬ 
ing. This is shown by the dashed arrow in 
Figure 4.2. It is called the first or general factor 
for the five-test battery. This provides a con¬ 
siderable simplification of the data. Instead 
of representing a person's score by five num¬ 
bers, for five tests, we can represent the score 
by just one number, his or her factor score 
on the general factor. Obviously some loss 
of accuracy will result, just as a summary 
score on a test loses information about what 
problems a student did or did not answer 
correctly. But how much accuracy has been 
lost? 
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Figure 4.2. The first and second factors extracted from the 
hypothetical tests shown in Figure 4.1. 


In order to answer this question we work 
backward, by determining the correlations 
between the first factor scores and each 
of the original scores. These are called the 
loadings of the original test on the first fac¬ 
tor. Unless the loading is 1 on every test 
(which would imply that all the correlations 
between tests were also 1) there will be some 
loss of accuracy; the first factor score does 
not predict the original test scores perfectly. 
How great a problem is this? 

Consider a particular student, Ignatz. 
Ignatz's score on the general factor will be 
influenced by his score on all five of the orig¬ 
inal tests. Suppose that Ignatz did excep¬ 
tionally well on the vocabulary test, but 
poorly on the other tests. His score on the 
general factor will therefore be lower than 
would be expected from the vocabulary 
test score alone. Factor analysis represents 
Ignatz's vocabulary test score by two scores, 
the score predicted from Ignatz’s position 
on the general factor and a residual, reflect¬ 
ing the difference between the actual score 
and the predicted score. 

Residual score on vocabulary 

= Actual score - score predicted from 
first factor 

Actual score = Score predicted from 
first factor + residual score 


Repeating this procedure, the residual 
scores can be correlated, the machinery of 
exploratory factor analysis applied to the 
residual correlations, and voila , we have 
a second factor representing any common 
trend in the “directions” defined by the resid¬ 
ual scores. However, these directions will be 
lines in a four -dimensional space, because 
one dimension has been “lost” to the first 
factor. The result of this process is shown by 
the second dashed line in Figure 4.2. 

This process can be continued, analyz¬ 
ing residuals, then residuals of residuals, 
until the size of the residuals becomes so 
small that we may treat them as “noise.” 
(When the number of factors extracted is 
equal to the number of tests, the residuals 
will be identically zero.) In most applica¬ 
tions the residuals become small enough to 
ignore after only a few iterations. At that 
point we stop extracting factors. More gen¬ 
erally, we say that we extract M factors from 
K tests, where M is always less than K, and 
in most cases substantially less. In our hypo¬ 
thetical example we stop at two factors, as 
shown in Figure 4.2. 

The next step is to develop a psychologi¬ 
cal interpretation of the factors. At this point 
art supplements science. 

In the example the first factor lies 
between the cluster of verbal tests and the 
cluster of mathematical tests. Let us call this 
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Mathematical ability 

Arithmetic 



Figure 4.3. A correlated (oblique] factor solution for the 
hypothetical example, defining correlated verbal and mathematical 
abilities. 


factor “general competence.'’ The direction 
of the general factor is very close to the 
direction of the arrow representing the word 
problem test. This is reasonable, because 
solving word problems requires a combina¬ 
tion of verbal and mathematical skills. The 
word problem test would be called a marker 
for the general competence factor, because 
the test and the factor point in nearly the 
same direction. 

Because the second factor summarizes 
residual scores, the first and second factors 
must be uncorrelated. Geometrically, the 
arrows for the two factors will be at right 
angles to ( orthogonal to) each other. The two 
verbal tests and the two mathematical tests 
point in different directions along the sec¬ 
ond factor. Mathematically, we would say 
that one set of tests has a positive loading 
and the other has a negative loading on the 
second factor. 

One interpretation of the second factor in 
our hypothetical example is that it reflects 
negative correlations between the residuals 
of scores on the language and mathematical 
tests, suggesting that people can be classi¬ 
fied as “relatively verbally skilled" or “rel¬ 
atively mathematically skilled,” after hairing 
taken account of their general competence. The 
qualifying phrase is important. Findings like 
those in this example would not be evidence 
that people are either “good at math" or 


“good at language tasks,” but not both. It is 
only after the effects of general ability have 
been removed that a tilt toward linguistic or 
mathematical problem solving appears. 

This example is not completely made up. 
It is a simplification of an analysis conducted 
by Werner Wittmann, at the University of 
Mannheim in Germany. Wittmann analyzed 
scores achieved by fifteen-year-old students 
on the Program for International Student 
Assessment (PISA) mathematics and sci¬ 
ence examinations. 1 His two-factor solution 
resembles the one offered here. 

The division of the two factors into a gen¬ 
eral factor and an orthogonal residual has 
been dictated by the requirement that the 
two factors be orthogonal to each other. 
While this simplifies the mathematical pre¬ 
sentation, there is no compelling argument 
that a mathematically simple explanation 
will always lead to the clearest psycholog¬ 
ical explanation. An alternative way of pro¬ 
ceeding is to use factor analysis to identify 
the number of dimensions required, two in 
this case, and then to rotate the dimensions 
so that each of the factors lies amid a clus¬ 
ter of similar tests. (These notions can be 
given a mathematical definition.) Figure 4.3 
shows how this method of factor defini¬ 
tion would be applied to the hypothetical 

1 Wittmann, 2005. 
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example. This model identifies two psycho¬ 
logically interpretable factors: mathemati¬ 
cal ability, with high loadings for the two 
overtly mathematical tests, and verbal abil¬ 
ity, with high loadings on the two verbal 
tests. Instead of regarding the word prob¬ 
lems test as a marker for a general compe¬ 
tence factor, it is seen as a test requiring an 
amalgam of mathematical and verbal ability. 
This avoids the problem of dealing with a 
factor that seems to demand a negative cor¬ 
relation between abilities. The simplification 
has come at a cost. The mathematical model 
has been made more complex by adding an 
additional parameter, showing the correla¬ 
tion between factors. 

In fact, one could argue that the fact that 
the mathematical and verbal ability factors 
are correlated is itself evidence for a general 
factor. 

This example illustrates both the power 
and the weakness of exploratory factor anal¬ 
ysis. The technique provides an objective 
and accurate technique for determining how 
many factors are required to account for 
individual variation in the scores, across 
tests. Assigning meaning to the factors 
requires a mix of further mathematical 
assumptions and intuitions about the psy¬ 
chology underlying the variation. Given this 
mixture of mathematics and psychology, it 
is not surprising that a good deal of the liter¬ 
ature on intelligence from the 1930s through 
the 1960s reflected debates about how fac¬ 
tor analyses should be interpreted. The field 
was considerably advanced by the develop¬ 
ment of confirmatory factor analysis in the 
1970s. 

4.2.3. Confirmatory Factor Analysis 

Confirmatory factor analysis provides a way 
of deciding how well a theoretical model 
matches the observations in a psychometric 
study. 2 The investigator decides, in advance, 
the relationships that he or she believes 
hold between factors and tests, and for that 

2 Confirmatory factor analysis is a subset of a more 
general statistical technique, structural equation 
modeling. As this method is fairly complicated I will 
touch on it only when necessary. 


matter, between the factors. Computation¬ 
ally demanding algorithms are used to com¬ 
pare the observed relations between mea¬ 
sures to the investigator’s assumptions. This 
moves factor analysis from a method for 
inducing relations from data to a method 
for testing hypotheses about data - a prefer¬ 
able approach in data analysis. The follow¬ 
ing example illustrates the method. The 
example is due to Karl Joreskog, a Swedish 
psychometrician who was one of the major 
developers of confirmatory factor analysis, 
and his colleague Dag Sorbom. 5 

Psychological tests can be given in either 
speeded or unspeeded conditions. In a 
speeded test examinees have to answer all 
questions within a fixed time period; in 
an unspeeded test examinees take as much 
time as they like. Joreskog used confirmatory 
factor analysis to investigate the question 
of whether speed pressures influence some 
people more than others. The examinees in 
this study were given four vocabulary tests. 
Two of the tests, %, and x 2 , were fifteen- 
item tests, administered with ample time to 
complete all fifteen items. The other two 
measures, y l and y 2 , were seventy-five-item 
tests, administered under strict, tight time 
limits. The question of interest is whether 
the same trait, word knowledge, was mea¬ 
sured in each condition, or, as is some¬ 
times asserted, there are some people who 
just do not do well on speeded tests, even 
if they know the information? Table 4.1 
shows the variance-covariance matrix for 
the four tests, which is what was analyzed. 
The bottom part of the table shows the cor¬ 
relation coefficients implied by the variance- 
covariance terms. 

Joreskog and Sorbom assumed that dif¬ 
ferent, possibly correlated latent traits 
( factors ) underlie performance on the two 
types of vocabulary tests. Their assumption 
is shown in Figure 4.4, using the graphic con¬ 
ventions introduced in Chapter 1. Causative 
relationships are indicated by single-headed 
arrows. For example, the first factor, f, is 
assumed to be a cause of performance on the 

3 Joreskog & Sorbom, 1979. The example is discussed 

on p. 52 ff. The original data is due to Lord (1956). 
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Table 4.1. The variance/covariance and correlation matrices 
in the confirmatory factor analysis example. Tests x, and x 2 
are unspeeded vocabulary tests; tests y, and y 2 are 
vocabulary tests completed under a stringent time limit 


Variance/Covariance Matrix xi X2 yi y2 


XI 

86.3979 




X2 

57-775 1 

86.2632 



yi 

56.8651 

59-3 1 77 

97.285 


y* 

58.8986 

59.6683 

73.8201 

97.8192 

Correlation Matrix 

XI 

X2 

yi 

y^ 

XI 

1.00 




X2 

0.6r- 

1.00 



yi 

O.62 

p 

ON 

VJ 1 

1.00 


y2 

0.64 

0.65 

p 

Ti 

Os 

1.00 

Source: Data from 

amalgamating information in 

Tables 2 

and 3 of 


Joreskog & Sorbom, 1979. 


two unspeeded tests. Accordingly there are 
single-headed arrows from latent variable /, 
to observable variables x, and x 2 . 

The boxes, ellipses, and arrows establish 
the structure of a factor analytic model. The 
e’s and the Greek letters indicate the three 
types of parameters to be estimated. The 
P variables indicate the relationship bet¬ 
ween the latent and the observed vari¬ 
ables. These represent causation; the latent 


variable is seen as being one of the causes 
of the observed variable, and the value of /3 
indicates the extent of the causal relation. 
The e variables represent residuals. 

Residuals are not “error" terms in the 
sense that an error is a mistake. They 
represent the effects of all unmeasured fac¬ 
tors that might influence the observable 
scores, but are not part of the model. In this 
example each residual term is uniquely 
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Figure 4.4. The two-factor model used by Joreskog and Sorbom to 
summarize correlations between unspeeded {x } , x 2 ) and speeded 
(y lf y 2 ) vocabulary tests. The model assumes two latent traits, one 
representing the ability to do speeded tests and another representing 
the ability to do unspeeded tests. A correlation between the two 
latent traits is permitted, with the correlation, p, to be estimated 
from the data. The observed correlations are indicated by the lower 
dashed lines between the x and y variables. This model could be 
contrasted to a single-factor model by requiring that p be identically 
one. Note the residual terms showing external influences on each of 
the four tests. (Figure modeled after Joreskog & Sorbom, 1979, 

Figure 3, p. 54.) 
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associated with a measurable variable. It is 
possible to construct models that postulate 
an unmeasured residual variable that influ¬ 
ences several latent variables. If the system 
of variables were to be completely closed, so 
that the hypothesized factors were the only 
factors influencing the observable measures, 
the residuals would always be zero. In the 
social and behavioral sciences this virtually 
never happens. By examining residuals we 
can get an idea of the extent to which a 
model captures the variation between the 
observables. The smaller the residuals, the 
more the model captures the variations that 
are to be explained. 

The p symbols indicate the magnitude of 
a correlation between variables, without any 
implication of causation. Therefore, the p 
symbols are always associated with double¬ 
headed arrows. 

The absence of a link between two vari¬ 
ables indicates that the model being tested 
assumes that the two variables are not 
linked, that is, that the appropriate or p is 
identically zero. 

This model makes strong predictions 
about the observed correlations among the 
four tests. Consider the two unspeeded tests, 
the x variables. If the model is accurate, 
the covariance between them, Cov (x 1; x 2 ), 
should depend only on (a) their respective 
variances and (b) the two parameters 
1 3 21 . A similar argument holds for the y vari¬ 
ables. But consider the case of the covari¬ 
ance between an unspeeded and a speeded 
test, for example, Xj andy 2 . The covariance 
should depend on the test variances and 
three parameters, j 3 u , p, and / 3 32 . Generaliz¬ 
ing, any assignment of values to the param¬ 
eters will imply predicted covariances (and 
therefore correlations) between all pairs of 
observed tests. These form the predicted 
covariance matrix , E. It can be compared to 
the observed covariance matrix, S. What con¬ 
firmatory factor analysis does is to find the 
set of values for parameters that brings E 
as close as possible to S. This is exactly the 
same logic as is used, for instance, to esti¬ 
mate the mean and variance of a population 
from a sample. Only the details are more 
complicated. 


4.2.4. Statistical Evaluation 
and Model Comparison 

Let M be a collection of parameters, some 
of which have been estimated and some of 
which have been set by theory. We will refer 
to this as a model. Figure 4.4 is an example 
of a model. Any model M implies a pre¬ 
dicted covariance matrix, E(M), that can 
be compared to the observed covariance 
matrix, S. Statistical inference techniques 
can be applied to determine the extent to 
which the two agree. This is called the 
fit of the model to the data. Evaluations 
of fit can include determining whether or 
not the differences are greater than would 
be expected “by chance.” This is logically 
identical to testing a null hypothesis, as 
described in introductory statistics courses. 
(The computations are quite a bit more 
complicated!) 

It is often possible to go beyond evaluat¬ 
ing fit. We can compare two different mod¬ 
els, Mi and M2, to each other. The simplest, 
and often the most useful, case is one in 
which the parameters to be estimated in M2 
are a subset of those to be estimated in Mi. 
This can be illustrated using the model in 
Figure 4.4, which will serve as Mi. 

Mi assumes that two ability factors are 
involved, the ability to take speeded vocab¬ 
ulary tests and the ability to take unspeeded 
ones. However, the model allows for the 
(reasonable) possibility that the two may 
be correlated, that is, that people who do 
well on unspeeded tests tend to do well 
on speeded tests, but that the two talents 
are not exactly the same. The extent of the 
correlation is indicated by the p parameter, 
whose value is to be estimated from the data. 
A fairly obvious alternative is to assume that 
all vocabulary tests tap the same underlying 
trait, strength of vocabulary, and that it does 
not matter if the tests are speeded or not. If 
that is the case, Figure 4.4 should be modi¬ 
fied to show just one latent trait, /, instead 
of/j and/ 7 . Call this model M2. 

Here is an important insight. Saying that 
there is one trait is equivalent to saying that 
there are two traits but that they are per¬ 
fectly correlated, p — 1. Therefore, M2 has 
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the same parameters as Mi, but, and it is 
a very important “but,” in M2 p is fixed at 
1 instead of being estimated from the data. 
This has two important consequences. The 
first is that M2 cannot have a better fit than 
Mi, because the computational techniques 
have ensured that in Mi p has the best pos¬ 
sible value it can have in order to get a good 
fit to the data. The second consequence is 
that we can compare the fits of the two 
models, to determine whether the differ¬ 
ence between fits shows that Mi is more 
accurate than M2 than would be expected 
by chance. Going back to the example, we 
can determine whether the data requires 
that we assume separate vocabulary skills 
for speeded and unspeeded tests, or whether 
it is possible to make the simpler assump¬ 
tion that all the tests evaluate a single trait, 
vocabulary knowledge. 

This illustrates a very important advance 
in theory construction. Confirmatory factor 
analysis permits testing of the relative accu¬ 
racy of different models as summaries of the 
same data. 

In this and subsequent chapters the 
reader will be presented with several mod¬ 
els of the sort just described. The models 
will typically involve many more observable 
and latent variables, and the structure of the 
models will be more complex than in the 
simple example, but the principle of anal¬ 
ysis will be the same. Confirmatory factor 
analysis allows us to establish a competi¬ 
tion between models, and this is how science 
proceeds. 4 

4 . 2 . 5 . Limits on the Generality 
of Factor Analysis 

The development of modern factor analy¬ 
ses has greatly enhanced the precision of 
psychometric research. This point has not 
always been appreciated by critics, who 
sometimes appear to be attacking the field 
for deficiencies in techniques that were 

4 See Bouchard (2009). I have skipped over a num¬ 
ber of technical details. It is possible, although rare 
in practice, to find situations in which alternative 
solutions have such close fits that confirmatory fac¬ 
tor analysis cannot distinguish between them. 


abandoned years ago. 5 Nonetheless, there 
are some limitations on generalizing the 
results of factor analytic research. 

Factor analytic studies are often used to 
reveal the nature of underlying, unobserv¬ 
able traits, especially general intelligence. 
Tests with loadings close to 1 are impor¬ 
tant pieces of evidence, because a loading of 
l indicates that variation in observable test 
performance is perfectly linked to variation 
in the trait. For instance, the argument that 
the ability to detect patterns is an important 
part of general intelligence is supported by 
the fact that progressive matrix tests often 
have high loadings on the g factor, and that 
pattern detection is an important step in 
solving progressive matrix problems. In this 
case, we say that progressive matrix tests are 
markers for general intelligence. 6 

Unfortunately, it is easy to misinterpret 
this statement. A high loading between a 
trait and a test does not mean that the qual¬ 
ity indicated by the trait is required for 
performance on the test in the way that, 
say, strength is required to push a piano 
across your living room. It means that varia¬ 
tion in test performance, across people, is 
related to the way the trait varies, across 
people. Loadings are statements about indi¬ 
vidual differences as observed in a partic¬ 
ular group. A person’s standings on both 
trait and test are defined relative to the 
performance of other individuals in the 
group, not by the individual’s absolute per¬ 
formance. The mathematics whiz in high 
school may not be that outstanding at MIT. 
This has an important implication for factor 
analysis. 

The test loadings derived from factor anal¬ 
ysis result from an interaction between the 
properties of the test itself, the other tests in 
the battery, and distribution of traits in the 
population being tested . 

5 This is especially true of widely cited attacks by 
the evolutionary biologist Stephen J. Gould (1981, 
pp. 252-255). Surprisingly, the error was repeated in 
a revised edition of this work (Gould, 1996) even 
though the error had been commented upon by 
reviewers of the original work. 

6 Jensen, 1998. 
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Some thought experiments illustrate 
what this means. 

Here is a list of nine possible tests: 

1. A vocabulary test. 

2. A test of paragraph comprehension 
based on newspaper editorials. 

3. A test requiring judgments of the syntax 
of English sentences. 

4. A test requiring people to evaluate the 
logical argument contained in a political 
speech. 

5. A test in which people are asked to 
judge whether or not meaningless pat¬ 
terns presented at different orientations 
are identical or whether one is the mir¬ 
ror image of another. This means that 
the examinee has to rotate figures “in 
the mind's eye.” For this reason such 
tests are called rotation tests. 

6. A test in which people view a room 
(e.g., an office) and then imagine that 
they are at one location, facing in a par¬ 
ticular direction, and point to a target 
object. For example, “imagine that you 
are at the desk, facing the file cabinet. 
Point to the flower pot.” 

7 . A test in which the task is to find a 
picture hidden in another; for exam¬ 
ple, finding a triangle in a Star of David 
figure. 

8. A map-reading test. 

9. A word problem test in which peo¬ 
ple read passages about an explorer 
going through a jungle and then answer 
questions about the spatial layout of 
objects. To illustrate: “As Henry pro¬ 
ceeded southward, he realized that the 
python was keeping up with him on the 
right and the tiger on the left.” Ques¬ 
tion: Was the python west of the tiger? 

The first four tests evaluate language skill; 
the next four evaluate visual-spatial reason¬ 
ing; and the last test evaluates both language 
skills and visual-spatial reasoning. Suppose 
that these tests have been constructed so 
that the scores are normally distributed in 
the high school population. Because visual- 
spatial and language skills are correlated in 
the high school population, it would be 


possible to extract a general factor and resid¬ 
ual spatial and verbal factors. 

Imagine two further studies. In the first 
we give the test battery to a representative 
sample of lawyers. In the second it is given to 
a representative sample of architects. What 
would happen to the factor loadings? I sug¬ 
gest the reader stop and think a moment. 

Lawyers are highly selected for language- 
related skills. Therefore, the four tests of 
language skills, although appropriate for a 
high school sample, would be quite easy for 
the lawyers. As a result there would be lit¬ 
tle variation in scores on the verbal tests. 
Lawyers are not selected for their visual- 
spatial skills, so we might find considerable 
variation on the spatial tests. An analysis 
for a single general factor would result in 
a factor with high loadings on spatial tests, 
because that is where the variance would be. 
Test 9, which draws on both verbal and spa¬ 
tial skills, would have a high loading on the 
spatial residual factor and a low loading on 
the verbal factor. 

The spatial skills tests would be easy for 
the architects, while tests of verbal skills 
should show more variation. Accordingly, 
it is likely that the general factor, as defined 
for architects, would have high loadings on 
verbal tests. Test 9 would now have a high 
verbal loading and a low spatial loading. 

These changes in loading have nothing to 
do with the absolute abilities of the indi¬ 
viduals involved. They are driven by the 
amount of variation in the population, not 
the level of skill. The examples are not con¬ 
trived; they represent a situation that occurs 
often in the literature. It is common prac¬ 
tice to conduct psychological studies using 
college students as participants. College stu¬ 
dents are selected largely on verbal skills 
and on general intelligence. As a result, the 
variation among college students on these 
traits is less than the variation in the popu¬ 
lation. Therefore, a study of the importance 
of intelligence in college students is likely 
to underestimate the importance of intelli¬ 
gence (or verbal skills) in the population at 
large. 

The next thought experiment deals with 
another problem that limits the generality 
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of factor analytic findings. The loading of 
a test on a trait can vary depending upon 
what other tests there are in the battery, 
even though the same population is studied. 

Suppose that instead of using all nine 
of our tests we compose a smaller battery 
consisting of the first five tests - four tests 
of language skills and the rotation test. The 
test battery is given to a representative set 
of adults. We then extract a single general 
factor from the data. It would be defined by 
the traits required by the four verbal tests. 
The single spatial test (test 5) would have a 
low loading on the general factor and a large 
residual term. The vocabulary test (test 1] 
would have a high loading on the general 
factor, for it has repeatedly been found that 
people with high general verbal skills tend 
to have large vocabularies. 

Now construct a new battery consisting 
of the vocabulary test and the four visual- 
spatial tasks, tests 5-8. The general factor will 
be determined largely by the visual-spatial 
tasks. The rotation test would have a high 
loading on this factor, while the vocabulary 
test would have a low loading and a high 
residual term. 

These two thought experiments show 
what might happen, not what does. In prac¬ 
tice, investigators use similar test batteries, 
which leads to replicability of results, but at 
the same time produces a restricted defini¬ 
tion of intelligence, because the same mental 
skills are being evaluated from experiment 
to experiment, in only slightly different 
ways. Factor structures for a given battery 
are usually replicated across populations 
(college students, military recruits, popula¬ 
tion samples of adults), although there are 
occasional failures to replicate. The restric¬ 
tions on generality developed here are warn¬ 
ings to keep in mind, not compelling argu¬ 
ments for throwing out all factor analytic 
studies. 

4 . 3 . The Theory of General 
Intelligence (g) 

The general intelligence (g) theory is at once 
the best-known, most praised, and most 


vilified psychometric model of intelligence. 
The theory asserts that individual differ¬ 
ences in cognitive performance are very 
largely due to individual differences along 
a single dimension of mental competence, 
general intelligence (g). Psychologists who 
advocate the g model acknowledge that 
there are other dimensions of mental per¬ 
formance, such as verbal or visual-spatial 
reasoning, but claim that these dimensions 
account for much less of the variation in 
cognitive performance than the g dimension 
does. 

4 . 3 . 1 . Early Development of the Theory 

The g model was first stated by Charles 
Spearman, an Englishman who has some 
claim to be the first modern psycho¬ 
metrician. 7 After serving for some time as an 
army officer, Spearman entered academia, 
receiving a Ph.D. from Wilhelm Wundt’s 
experimental psychology laboratory. He was 
attracted to Galton’s concept of intelli¬ 
gence as generalized mental fitness, but did 
not share Galton’s enthusiasm for what we 
would, today, call information-processing 
measures of cognition. Instead he thought 
that one should find a general factor through 
the analysis of complex measures of think¬ 
ing, such as school grades. In order to inves¬ 
tigate his ideas he developed both the basis 
for early factor analysis and the rank-order 
correlation coefficient, thus making substan¬ 
tial contributions to the budding science of 
statistics. David Wechsler, the developer of 
the WAIS, was a student of Spearman’s, as 
was Raymond Cattell, whose work will be 
described later in this chapter. His work also 
heavily influenced the ideas of Philip E. Ver¬ 
non, Hans Eysenck, Cyril Burt, and Arthur 
Jensen, all of whom have made major con¬ 
tributions to the field of intelligence. 

Table 4.2 presents a table of correlations 
between a set of school grades that Spear¬ 
man used as evidence for g. Following Spear¬ 
man’s practice, the classes are arranged in 
order of the average correlation between one 
grade and the other grades. To illustrate, 

7 Spearman, 1904, 1923, 1927. 
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Table 4.2. The correlations between school grades used by Spearman as an illustration of 
the theory of general intelligence (g). The figures in parentheses are the predicted 
correlations when Carroll (1993) applied the theory using modern computing methods. 


Class 

Classics 

French 

English 

Mathematics 

Pitch 

Music 

Classics 

French 

English 

Mathematics 

Pitch 

Music 

•8? (.84) 

.76 (.77) 

.70 (.72) 

.66 (.64) 

.63 (.62) 

1 

.67 (.70 

.67 [.66] 

.65 (.60) 

•57 (- 57 ) 

1 

.64 [.60] 

■54 (' 54 ) 

■ 5 i (- 5 2 ) 

1 

•45 (- 5 °) 

• 5 1 C- 4 8 ) 

1 

.40 (.43) 

\ 


Classics, the first grade on the table, has the 
highest average correlation with the other 
grades. 

Spearman argued that a student’s grade 
in a class could be thought of as a mea¬ 
sure of the sum of a student's general¬ 
ized intelligence and the student’s unique 
abilities and knowledge relevant to the class. 
Algebraically this is 

Xij = Pjgi +Sij, [ 4-0 

where %ij is person is grade on topic j, 
gi is person is general intelligence, and sq 
is person is ability on a trait that reflects 
those aspects of test j that are not associated 
with general intelligence, is a term that 
reflects the importance of the general factor 
in determining the grade in the ;th class. In 
modern terms, is the factor loading of test 
j on the general factor, and 5,9 is the residual 
term for person i on test j. 

There were no computers in Spearman’s 
day, so he used clever but, by today's stan¬ 
dards, archaic methods of estimating factor 
loadings. Almost one hundred years later a 
modern psychometrician, John Carroll, fac¬ 
tor analyzed Spearman’s data. 8 The resulting 
model is shown in Figure 4.5. The parenthe¬ 
sized terms in Table 4.2 are the correlations 
that would be expected if the g model were 
an accurate summary of the data. As can 
be seen, the observed and predicted corre¬ 
lations [the S and T matrices of section 4.2) 
are quite close. Spearman’s theory is clearly 
supported by this data. 

8 Carroll, 1993, p. 38. 


Although Spearman and his modern fol¬ 
lowers emphasized the importance of g, 
they recognized the existence of group fac¬ 
tors, intellectual skills, such as facility with 
language or facility in dealing with visual- 
spatial patterns, that are less general than g 
but more general than an ability to take a 
specific test. Figure 4.6 shows a proposed 
analysis of our eight hypothetical tests in 
terms of Spearman’s expanded model. All 
tests would be expected to load on the g 
factor, but the verbal tests would also be 
expected to load on a language skill factor, 
and the spatial tests would be expected to 
load on a visual-spatial factor. 

In Figure 4.6 the g, language factor, and 
the visual-spatial skill factors are not allowed 
to correlate with each other or with g. 
[This is indicated by there not being any 
line connecting the three factors.] Spear¬ 
man had to make this assumption because 
he did not have access to the computational 
power required to evaluate more compli¬ 
cated models. Modern computing allows us 
to consider a more sophisticated version, 
the hierarchical model shown in Figure 4.7. 
In this model Spearman’s group factors are 
referred to as (broad) second-order factors, 
and g, a third-order factor, is inferred from 
the correlations between the second-order 
factors. We will see several other examples 
of hierarchical models subsequently. 

4 . 3 . 2 . The Evidence for g 

Spearman’s model has lasted longer than 
most psychological theories do. Arthur 
Jensen, a prominent modern advocate of 
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Figure 4.5. Loadings and residual terms when the g model was 
applied to Spearman’s school grade data. Loadings calculated by 
Carroll (1993), residuals calculated by the author. 


g theory, has claimed that after literally a 
century of exploration the g model provides 
a simple, accurate summary of a massive 
number of studies of intelligence. 9 Jensen 
built his case on three lines of evidence. 

The first is the widespread observance of 
positive manifold. Virtually all tests of cog¬ 
nition are positively correlated. The second 
line of evidence is that measurements of gen¬ 
eral intelligence are among the best predic¬ 
tors of performance both in school and in 
the workplace. We will expand on this point 
in Chapter 10. The third line of evidence 
is that measures of g are related to a rel¬ 
atively small set of information-processing 
functions and, perhaps more importantly, 
to certain brain-based and genetic mea¬ 
sures. This point is discussed in Chapter 6. 
Here we look only at the psychometric 
evidence. 

Positive manifold is a widespread statis¬ 
tical phenomenon. The following examples 
show its ubiquity. 

Gilles Gignac, an Australian psychome¬ 
trician, analyzed four widely used American 
tests, the KAIT, the WAIS-R and WAIS- 
III, and a multidimensional aptitude battery 
(MAB) designed to cover a wide variety of 

9 Jensen, 1998. 


cognitive skills. 10 While all analyses showed 
substantially the same results, the WAIS-III 
analysis is of particular interest because the 
standardization sample was constructed to 
represent the US adult population, using a 
sample of just under 2,500 people ranging in 
age from sixteen to eighty-nine. 

Every subtest of the WAIS-III had sub¬ 
stantial loadings on the general factor. The 
two highest loadings were on the Arithmetic 
(.79] and Vocabulary (.75) subtests. 11 This 
certainly fits the definition of a general fac¬ 
tor, applying to a wide variety of cognitively 
demanding tasks. The lowest loadings were 
found on the Digit Symbol and Coding sub¬ 
tests [.53 and .56), which are essentially tests 
of short-term memory. A matrix test simi¬ 
lar to the RPM described in Chapter 2 had a 
loading of .70, which is consistent with the 
argument that progressive matrix tests are 
good markers for general intelligence. 12 

In addition to being ubiquitous, the g fac¬ 
tor was substantial. In Gignac’s study the 
general factor accounted for from 30% to 

10 Gignac, 2006. Gignac’s model somewhat resembled 
Spearman's model (Figure 4.6], but the treatment 
of residual effects was much more sophisticated. 

11 See Chapter 2 for a description of WAIS-III subtests. 

12 Jensen, 1998, pp. 37-38. 
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Figure 4.6. Spearman’s g model expanded to allow for group 
factors. In this hypothetical example the battery consists of eight 
tests. All eight have loadings on the g factor. In addition, the four 
tests involving language have a loading on a language-skill group 
factor, while the four tests involving visual-spatial reasoning have a 
loading on a visual-spatial factor. The two group factors are 
uncorrelated. 


60% of the variance in scores on individual 
tests. 

The American results are mirrored by 
findings in other developed countries. Deary 
and his colleagues analyzed the results 
from a study in which the Cognitive Abil¬ 
ities Test was given to virtually all mid¬ 
dle school children in England. 13 This test 
was designed to provide separate scales for 
verbal, quantitative, and nonverbal reason¬ 
ing, but the scales are not necessarily sta¬ 
tistically independent. 14 Indeed they were 
not. The general factor accounted for 70% 

13 Deary, Strand, Smith, & Fernandes, 2007. 

14 See Chapter 2 for a description of this test. 


of the variance. In another British study 
Elliott 15 examined results from examination 
of schoolchildren of varying ages, using the 
British Ability Scales. Elliott’s study was a 
challenging test of the g model, because the 
British Ability Scales were designed to pro¬ 
vide clinical and educational psychologists 
with a more differentiated view of cogni¬ 
tive abilities than that provided by g-loaded 
tests, such as the WAIS. Nevertheless, the 
general factor accounted for 30% of the 
variance on these scales, even though they 
had been designed to deemphasize general 
intelligence. 

15 Elliott, 1986. 
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Figure 4.7. The example in Figure 4.6 reformulated as a 
hierarchical model. The two group factors are now partially caused 
by g, and thus would be correlated. This is the most widely 
accepted model of intelligence today. 


Roberto Colom and his colleagues at the 
Universidad Autonoma in Madrid have ana¬ 
lyzed the normative data from the Spanish 
version of the WAIS-III, doing separate 
analyses for groups with different levels of 
education. 16 The g factor accounted for 45% 
of the variance in the two least-educated 
groups, and 25% of the variance in the 
two better-educated groups. This turns out 
to be a typical finding. The general fac¬ 
tor is often less powerful as cognitive skills 
increase. 

Psychologists from the University of 
Tilburg in the Netherlands administered 
three different cognitive batteries to groups 

16 Colom, Abad, Garcia, & Juan-Espinosa, 2002. 


of immigrant and native Dutch students. 17 
Depending on the battery, the g factor 
accounted for from 30% to 45% of the vari¬ 
ance. The g-loadings of the various sub¬ 
tests were nearly identical in both the native 
Dutch and the immigrant groups. This sug¬ 
gests a similarity of cognitive structures over 
two different ethnic groups. 

Similar results have been obtained in 
many studies. In 1993 John Carroll pub¬ 
lished an extensive survey of the results from 
many of the studies of batteries of tests that 
had been conducted up to that time. 18 He 
found evidence for a g factor in 146 different 

17 Helms-Lorenz, van de Vijver, & Poortinga, 2003. 

18 Carroll, 1993. 
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data sets. There appears to be little doubt 
that analysis of virtually any battery of tests 
of cognition, in any population within an 
industrial society, will reveal a g factor. 

As the thought experiments in section 4.2 
showed, demonstrations of positive mani¬ 
fold in different populations and with differ¬ 
ent test batteries do not show that the dif¬ 
ferent studies have found the same g. Carroll 
observed that in order to reach such a con¬ 
clusion one would have to administer several 
test batteries to the same individuals, extract 
the g factor from each battery, and show that 
they were highly correlated. 19 Subsequently, 
Wendy Johnson and her colleagues at the 
University of Minnesota produced just such 
a finding. They administered three different 
test batteries to a group of just under 500 
adults, of widely varying ages. 20 Their results 
strongly supported the contention that the 
g that is extracted by statistical analyses of 
widely used test batteries is the same trait, 
psychologically, regardless of the battery. 

Because the Johnson et al. test provided 
such strong evidence for a general reasoning 
factor as both a statistical and a psycholog¬ 
ical phenomenon, we need to take a look 
at the study itself. The batteries they used 
were the WAIS and two batteries that had 
been designed to sample a comprehensive 
range of verbal, quantitative, and nonverbal 
reasoning abilities. Their sample consisted 
of adults from their twenties to their seven¬ 
ties who were participating in an extensive 
behavior-genetic study of adoptees. The par¬ 
ticipants were generally white, of middle or 
higher socioeconomic class, and from either 
a North American or a European coun¬ 
try. Such a sample is certainly not statisti¬ 
cally representative of the population of any 
country, let alone the world, but it is fairly 
representative of an important segment of 
the population within the industrial/post¬ 
industrial nations. 

There is a massive amount of data show¬ 
ing a simple, clear-cut, and important fact. 
Within the culture of the industrial/post¬ 
industrial world, which is where these 

19 Ibid., p. 596. 

20 Johnson et al., 2004. 


studies were conducted, people can be reli¬ 
ably classified by the extent to which they 
display general intelligence Q g ), that is, a 
tendency to be cognitively competent in a 
wide variety of tasks. Anyone who asserts 
that individuals have to be described by sev¬ 
eral relatively independent cognitive traits is 
simply wrong. A person who is very good 
at verbal reasoning may not be the best at 
quantitative reasoning or reasoning about 
visual displays, but he or she is unlikely to 
be poor at these other tasks. The same argu¬ 
ment applies, of course, to people who are 
very good at quantitative or visual reasoning. 
Their verbal reasoning may not be as good 
as their reasoning in their strongest domain, 
but it is unlikely to be bad. 

4 . 3 . 3 . What Is the Nature of g? 

Positive manifold is a fact. There is no argu¬ 
ing with the existence of g as a statistical 
phenomenon. There is a great deal of argu¬ 
ment over the reason that different evalua¬ 
tions of cognitive performance are virtually 
always positively correlated. 

One possibility is that there is a mental 
trait, cognitive strength, analogous to physi¬ 
cal strength, that people possess to different 
degrees and that, as is the case for phys¬ 
ical strength, mental strength is useful in 
many settings. Let us call this the unitary g 
hypothesis. It implies that tests of cognitive 
competence will be positively correlated, as 
they are. But this is not conclusive evidence. 
Abstractly, if A (unitary g) implies B (pos¬ 
itive manifold), and B is observed, then A 
might be the case, but B might have arisen 
for other reasons. What hypotheses other 
than the unitary g hypothesis imply positive 
manifold? 

Positive manifold would arise if people 
possess distinct, rather specialized mental 
traits, such as verbal and visual-spatial rea¬ 
soning, and these traits are correlated in 
the population, for either environmental or 
genetic reasons. Call this the correlated traits 
hypothesis. Correlated traits would be anal¬ 
ogous to the statistical association between 
blond hair and blue eyes, neither of which 
causes the other. 
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A third possibility is that there are sep¬ 
arate cognitive traits but that over the life 
span there are positive interactions among 
these traits, so that high or low performance 
in one trait affects the development of oth¬ 
ers. The interaction could be due to biol¬ 
ogy, the environment, or some combina¬ 
tion of the two. For example, children who 
appear to be highly verbal may be singled 
out for special instruction, which improves 
their cognitive capacity, opening yet further 
opportunities. The opposite may also hap¬ 
pen; children who are perceived as being 
slow speakers or readers could be treated as 
if they were not too bright in general, lead¬ 
ing to a self-fulfilling prophecy of failure. 

The first position, that g exists as a 
trait, is the one held by most advocates of 
the general intelligence model. 21 But what 
is the trait behind the statistical abstrac¬ 
tion? To quote from a statement by a 
US president, "it depends on what you 
mean by ‘is’,” 22 for the question can be 
answered at several levels. At the psycho¬ 
metric level general intelligence could be 
defined by examining the marker tests that 
have high loadings on the general factor, 
and attempting to identify the common 
cognitive challenges they present. At the 
information-processing level general intel¬ 
ligence could be defined by showing that 
the general factor is highly correlated with 
particular information-processing functions, 
such as working memory, the ability to 
control attention, and general speediness. 
At the biological level general intelligence 
could be associated with individual differ¬ 
ences in brain structures and processes, and 
with variations in the genome. All three 
approaches have been tried. We will look 
at the psychometric evidence in this chap¬ 
ter, saving the information-processing and 
biological evidence for later chapters. 

Spearman's belief that the ability to 
detect patterns is central to intelligence led 

21 See, e.g., Gottfredson, 1997; Jensen, 1998, 2006. 

22 William (Bill) Clinton, commenting on the truth 
of his lawyer’s statement that “there is absolutely 
no sex of any kind” in Clinton’s relationship with 
a White House intern, Monica Lewinsky (Shapiro, 
2006, p. 160). 


his student John Raven to develop the Raven 
Progressive Matrices (RPM) tests described 
in Chapter 2. The effort was successful, for 
these tests do have high loadings on the gen¬ 
eral intelligence factor. Jensen states: 

When the Progressive Matrices test is fac¬ 
tor analyzed along with a variety of other 
tests it is typically among the two or three 
tests having the highest g loadings, usually 
around .80. Probably its most distinct fea¬ 
ture is its very low loadings on any factor 
other than g. 

Jensen, 1998, p. 38 

This is what would be expected if Spear¬ 
man were right, for a person who takes an 
RPM test does have to be able to notice 
visual patterns. However, the matter is not 
quite as clear as Jensen's statement sug¬ 
gests. Other tests, which appear to call forth 
very different sorts of problem solving than 
that required on the RPM tests, also have 
high loadings on g. In particular, vocabu¬ 
lary tests have nearly as high loadings on 
g as do RPM tests, a finding that has been 
found in samples as diverse as the WAIS-III 
normalization sample (representative of the 
US population), 25 Swedish schoolchildren,^ 
and a large group of adults who volunteered 
for a research project. 25 

After reviewing findings from a great 
number of studies, Carroll 26 found that four 
different classes of tests were consistently 
found to have substantial loadings on g. 
These were tests of inductive reasoning, 
visualization (the ability to imagine move¬ 
ments of relatively complex forms), quanti¬ 
tative reasoning, and verbal ability. He con¬ 
cluded that 

The eventual interpretation [of gj must 
resort to analysis of what processes are com¬ 
mon to the tasks used in the measurement 
[of the abilities just listed] and to the anal¬ 
ysis of what attributes of such tasks are 
associated with their difficulties. 

Carroll, 1993, p. 597 

23 Gignac, 2006. 

24 Gustafsson, 1984. 

25 Johnson et al., 2007 .1 calculated the loadings based 
on the data presented on p. 103 of their report. 

26 Carroll, 1993, p. 597. 
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It is certainly unclear what single cogni¬ 
tive process underlies all four of the tests. 
Accordingly, let us look at the correlated 
traits hypothesis, the idea that there are 
some cognitive functions which, although 
not what we would normally think of as 
intelligence in themselves, are required for 
a large variety of cognitive tasks. There¬ 
fore, individual differences in these func¬ 
tions would produce g as a statistical phe¬ 
nomenon, even though no such thing as 
g exists as an underlying quality of the 
mind. 

How would this work? Consider an anal¬ 
ogy to carpentry. Carpenters use the same 
tools to make a great many things, ranging 
from tables and kitchen cabinets to fences. 
Suppose that there are individual differences 
in the quality of tools available to different 
carpenters. Instead of thinking of a mod¬ 
ern carpenter, imagine a bit of time trav¬ 
eling, where we ask a modern carpenter, a 
medieval carpenter, a Bronze Age carpenter, 
and a Stone Age carpenter to make us some 
furniture, each using the tools appropriate 
to the historic era. We would not think of a 
saw, hammer, or adze as indicating carpen¬ 
try skill, in itself. Nevertheless, across the 
ages, we could extract positive manifold in 
making furniture solely because of the qual¬ 
ity of the tools. 

Unlike carpentry tools (but like carpentry 
skills], psychological processes are dynamic. 
In an absolute sense, cognitive competence 
rises through adolescence, and declines in 
old age. Short-term memory processes and 
the speed of making simple decisions show 
a similar rise and fall with age. The same 
principle applies to knowledge. Knowledge 
acquired in mathematics can be utilized 
while studying physics. One way to find 
support for the correlated traits hypothe¬ 
sis is to show that practice in solving one 
task involving general intelligence will pro¬ 
duce improvement on another, seemingly 
very different task. 

There are not as many such studies as 
I would like to be able to cite. However, 
two studies do nicely illustrate the issue. 
In one, German children were taught to 


verbalize their strategies when attacking 
matrix problems, and then to solve the task 
by verbal reasoning. This improved scores 
on the (allegedly] nonverbal test. 27 A sec¬ 
ond, and even more dramatic, test involved 
training children to use the abacus. 28 As 
anyone who has learned to utilize the aba¬ 
cus can testify, this requires concentration 
of attention and recognition of visual pat¬ 
terns. Sudanese children were given an age- 
appropriate RPM test, and then divided into 
experimental groups that received substan¬ 
tial training in the use of the abacus, and 
control groups that did not. Prior to train¬ 
ing, both groups had equivalent RPM scores. 
Following training, the experimental group 
outperformed the control group. 

These two very different studies can be 
used to make the same point. There are 
certain cognitive skills that are useful in a 
variety of contexts. These include verbaliza¬ 
tion, which makes different aspects of the 
problem open to conscious inspection, and 
concentration of attention, which is a more 
primitive operation. No one of these skills, 
alone, is “general intelligence/’ but collec¬ 
tively they are. If the possession of one of 
these skills is statistically associated with the 
possession of others, positive manifold, and 
its correlate, g, will result even though no 
single cognitive skill can be pointed to as 
“intelligence.” 

Why should the skills that contribute to 
general intelligence be correlated? One pos¬ 
sibility is that all these skills draw on a com¬ 
mon biological capacity, such as efficiency 
of neural processing, and that there are sub¬ 
stantial individual differences in that capac¬ 
ity. This argument makes positive mani¬ 
fold, and g, manifestations of a biological 
phenomenon. 

Another possibility is that positive man¬ 
ifold emerges from interactions in which 
the development of one cognitive skill facil¬ 
itates another. This is called mutualism. 
For instance, practicing verbalization during 
problem solving might facilitate the ability 

27 Carlson & Weidl, 1992. 

28 Irwing et al., 2008. 
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to control attention. If this is the case, 
the possession of key cognitive skills would 
become correlated, over time, even though 
they were initially uncorrelated. 29 

To summarize, the statistical evidence for 
g, positive manifold, could be produced by 
a pervasive general intelligence factor, or it 
could be produced by correlations between 
more specialized abilities. This leaves open 
the question of what these abilities are. Sub¬ 
sequently in this chapter we will look at 
a psychometric model that derives g from 
more specialized, correlated traits. First, 
though, we want to consider two reserva¬ 
tions about the generality of findings on gen¬ 
eral intelligence. 

4 . 3 . 4 . Reservations about g 

The extent to which a given study sup¬ 
ports the g model depends on the extent to 
which the tests used show positive manifold. 
If the tests are highly correlated, the g model 
is supported; if they are not, it is not. It 
cannot be stressed too strongly that many, 
many well-designed studies provide results 
that are consistent with the g model. Never¬ 
theless, two reservations are in order. 

The first is that psychologists who have 
studied intelligence have been rather con¬ 
servative in their decision about what sort 
of test is a test of cognition. The bulk of 
the evidence for g has been obtained by 
analyzing tests that fit into the “Drop in 
from the Sky” paradigm. If, as I urged in 
the first chapter, we expand our definition 
of intelligence to include problem solving 
in other situations, then other factors might 
be found, and g might or might not drop 
in power. What might happen is not what 
would happen. Whether studies of cogni¬ 
tion in expanded situations would provide 
evidence for g should be decided by empiri¬ 
cal research, not by the intuitions of people 
who support or oppose the model] There is 
a disturbing lack of data on this topic. 

The second reservation is that studies 
of specialized populations often do not 

29 Van Der Maas et al., 2006. 


show strong evidence for g. In part this is 
simply a statistical phenomenon. Take the 
case of college students at a highly selective 
institution, such as Stanford, Harvard, 
or Cambridge. These students have been 
selected by a process that, to a consider¬ 
able extent, evaluates their general reason¬ 
ing ability. Therefore, within the selected 
group there will be a restricted range of 
g, and other factors will determine individ¬ 
ual differences in test scores. The situation 
is analogous to the fact that height is not 
closely related to the ability to score points 
in professional basketball - because almost 
all the players are already very tall. 

This poses a practical problem, because 
in general the populations that are easiest to 
study are the ones where such restrictions 
of the range of reasoning ability may occur. 
We do not have to go to highly selective col¬ 
leges to see this effect; all college and uni¬ 
versity students have been subject to some 
selection on general reasoning ability. The 
same situation occurs when people try to 
obtain voluntary samples from the general 
population. A truly random sample is very 
hard to construct. It is much easier to recruit 
people from the middle and upper socioeco¬ 
nomic classes (SES] than to recruit people 
with low SES. This fact, and many other 
recruitment biases, operate to produce sam¬ 
ples with restricted ranges on g, thus often 
underestimating the importance of the trait 
in the population. 

However, that is not quite all there is to 
studies of populations with restricted ranges 
of g. There are some systematic changes in 
the ubiquity of g that are not solely due to 
statistical phenomena. 

4 . 3 . 5 . Where g Is Not Found: There Are 
Very Few Universal Geniuses 

In the 1930s Leon Thurstone, a professor 
at the University of Chicago, challenged 
Spearman’s g model. 50 Thurstone believed 
that intelligence is based on several dis¬ 
tinct primary abilities, rather than on a single 

30 Thurstone, 1938: Thurstone & Thurstone, 1941. 
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general-reasoning factor. The primary abili¬ 
ties were: 

Spatial reasoning : The ability to rea¬ 
son about figural representations. An 
example would be deciding whether 
two pictures did or did not represent 
the same object viewed from different 
perspectives. 

Perceptual speed : The ability to detect 
simple figures in a display. For an illus¬ 
tration, try to find the browser icon on 
your computer desktop. 

Number facility. The ability to do rela¬ 
tively simple numerical computations 
quickly. 

Verbal relations : The ability to compre¬ 
hend verbal statements. 

Word fluency : The ability to produce 
simple words and statements rapidly. 

Memory. The ability to recall informa¬ 
tion. Thurstone did not distinguish 
between short-term and long-term 
memory, as we certainly would do 
today. 

Inductive reasoning: Spearman’s ability 
to see patterns. A deductive reason¬ 
ing factor was added in some of 
Thurstone’s work. 

Thurstone claimed that these abilities are 
essentially statistically independent; being 
good or poor on one of them does not pre¬ 
dict whether a person is good, poor, or aver¬ 
age on another. This conclusion is diamet¬ 
rically opposed to Spearman’s claim that 
intelligence is largely produced by a single 
general-reasoning factor. 

In the 1940s and 1950s there was debate 
over whether the discrepancy between 
Thurstone’s and Spearman’s results might 
have been due to different groups having 
used different factor analytic methods. The 
development of modern computerized tech¬ 
niques has essentially ended that discussion. 

The discrepancy was probably due in part 
to restriction of range effects. In general, 
Spearman and other British psychologists 
analyzed data from the testing of schoolchil¬ 
dren. While Thurstone did similar studies, 
he also relied a great deal on studies of 


University of Chicago students. Chicago 
was, and is, a highly selective institution, so 
his college sample undoubtedly had a highly 
restricted range of scores on g. A third likely 
reason for the discrepancy has more psycho¬ 
logical content. It could be that the structure 
of intelligence is more differentiated at high 
levels than at lower levels. In concrete terms, 
unusually high scores on a test of verbal rea¬ 
soning might be only moderate predictors of 
unusually high scores on a test of mathemat¬ 
ical reasoning, while unusually low scores on 
the verbal test could be very good predictors 
of unusually low scores on the mathemati¬ 
cal test. To the extent that this is true, factor 
analysis would reveal relatively independent 
factors in a high-ability group, while reveal¬ 
ing a strong g factor in a low-ability group. 

Modern research has shown that this 
is the case. Douglas Detterman, a pro¬ 
fessor at Case-Western Reserve University, 
divided the WAIS-R and WISC standard¬ 
ization samples into five ability groups. 51 
The strength of the g factor was highest 
in the low-ability group, and declined as 
group IQ score increased. Similar results 
have been obtained by other investigators, 
using other tests, in both national and inter¬ 
national settings. 32 

This is an important result, because it is 
relevant to a social issue, the distribution of 
intelligence at high levels of cognitive func¬ 
tioning. High levels of talent appear to be 
fairly specialized, while marginal cognitive 
talent seems to have general effects. Why 
might this be the case? 

In part it is probably due to experience. 
Modern society encourages specialization to 
a much greater extent than past societies 
did. Studies of expertise in a variety of fields, 
ranging from athletics to chess, have shown 
that acquiring a high level of expertise takes 
a great deal of time and effort. 33 At high lev¬ 
els of talent, therefore, social pressures lead 
to a differentiation of cognitive competences 

31 Detterman, 1991; Detterman & Daniel, 1989. 

32 Abad et al., 2003; Colom et al., 2002; Deary et al., 

1996; Hunt, 1995a; Legree, Pifer, & Grafton, 1996. 

For some negative evidence see Sakolofske et al., 

2008. 

33 Ericsson, 1996; Ericsson et al., 2006. 
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due to specialized training. But this cannot 
be the whole picture, because the differen¬ 
tiation of abilities at high levels (or, con¬ 
versely, generalization at low levels) occurs 
in children, as evidenced by studies on dif¬ 
ferentiation involving the WISC. 

Whatever the reason for this, we will 
want to keep in mind the fact that high lev¬ 
els of talent are specialized when we come 
to a discussion of the implications of intelli¬ 
gence, in Chapter 10. 

4.4. The Three-Stratum Model: Cattell, 
Horn, and Carroll's View of 
Intelligence 

A widely cited alternative to the g model 
is the Gc-Gf or three-stratum model, orig¬ 
inally developed by Raymond Cattell and 
John Horn, then modified by John Carroll. 54 

Cattell studied with Spearman in Eng¬ 
land, and then moved to the University 
of Illinois and finally to the University of 
Hawaii. He and his then-student at Illinois, 
John Horn (subsequently a professor at the 
Universities of Denver and Southern Cali¬ 
fornia), believed that Spearman’s theory did 
not give sufficient weight to group factors. 
They were also skeptical of the idea that 
g is a trait. Instead they believed that posi¬ 
tive manifold is due to individual tests draw¬ 
ing on several broad factors. To follow their 
argument, consider that bane of K-12 stu¬ 
dents, the mathematical word problem. 

A train leaves station A and proceeds to 
station B , traveling at 60 miles an hour. 

At the same time that this train leaves A, 
another train leaves B, bound for A, travel¬ 
ing at 30 miles an hour. The distance from 
A to B is 2jo miles. How long will it be 
before the trains meet? 

A student who tries to solve this problem 
has to know the meaning of words and the 
syntax of English, lexical retrieval ability and 
sentence comprehension. Several facts have to 
be kept in mind as others are received, call¬ 
ing upon short-term memory ability. Numer- 

34 Carroll, 1993; Cattell, 1971, 1987; Horn, 1985; Horn & 
Noll, 1994. 


ical facility is required. These narrowly 
defined abilities are examples of first-stratum 
or primary abilities. The first-stratum abil¬ 
ities are themselves grouped into broader, 
second-stratum abilities. According to Cattell 
and Horn the most important of these are 
fluid intelligence (Gf) and crystallized intelli¬ 
gence (Gc), which they defined as the abil¬ 
ity to deal with new and unusual prob¬ 
lems (Gf) and the ability to apply previously 
acquired knowledge to the current problem 
(Gc). In many contexts visual-spatial ability 
(Gv), the ability to deal mentally with visual 
images, is also important. The primary abil¬ 
ities of inductive and deductive reasoning 
are grouped under fluid intelligence; general 
cultural knowledge and lexical knowledge 
abilities are grouped under crystallized intel¬ 
ligence; and the abilities to compare spatial 
forms and to manipulate spatial forms “in 
the mind’s eye” are grouped under visual- 
spatial ability. 

As the theory has evolved additional 
second-stratum abilities have been defined. 
These include factors for retrieval from 
short- and long-term memory, the ability to 
deal with auditory as well as visual stimuli, 
quantitative ability, and two factors reflect¬ 
ing cognitive processing speed; cognition in 
general [cognitive processing speed, Gs) and 
another dealing with the speed with which 
very simple decisions are made [decision 
reaction time, Gt). 

Cattell and Horn had long and active 
careers, during which they had colleagues 
who conducted research dealing with such 
things as the processing of auditory and tac¬ 
tile stimuli. In the typical extension of the 
theory a study would be conducted with 
a battery that included some tests already 
identified as a marker of abilities found 
by previous research, and some new tests 
that explored different primary skills. This 
process inevitably resulted in the defini¬ 
tion of still more second-stratum factors. 
For instance, Stankov used this paradigm to 
extend the Gf-Gc model to tasks involving 
auditory stimuli, such as the ability to dis¬ 
criminate sounds. 55 

35 Stankov & Horn, 1980; Horn & Stankov, 1982. 
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Figure 4.8. The structure of the second-stratum and third-stratum 
abilities in the WJ-III test. Codes: Gc - crystallized intelligence; 
Gf - fluid intelligence; Gq - quantitative ability; Gv - 
visual-spatial ability; Glr - long-term storage and retrieval in 
memory; Gsm - short-term memory; Gs - cognitive speediness; 
g— general intelligence. The loadings are those found in the WJ-III 
normalization sample, ages twenty to thirty-nine. 


4 . 4 . 1 . Extensions and Applications 
of the Three-Stratum Model 

On the basis of his substantial review of 
the literature, Carroll 36 concluded that the 
Gc-Gf model provided a good fit to most 
of the over 450 data sets that he reana¬ 
lyzed. However, in almost all cases abilities 
at the second stratum were themselves cor¬ 
related. He took this as evidence for a single 
third-stratum factor, general intelligence. 
Conceptually this is the g factor advocated 
by Spearman 37 and, almost a century later, 
by Jensen. 38 

The resulting three-stratum theory has 
been used as a basis for several psychometric 
batteries, including the Woodcock-Johnson 
test battery 39 [WJ-III] and the revised 

36 Carroll, 1993. 

37 Spearman, 1904, 1927. 

38 Jensen, 1998. 

39 McGrew, 2005. 


Kaufmann Adult Intelligence Test 
[KAIT]. 40 Table 4.3 lists some of the 
primary abilities evaluated by the WJ-III 
battery. In the interests of space, only 
enough of these abilities are listed to give 
the reader an idea of the theory. 41 In the WJ- 
III most of these abilities are represented by 
a single test, and therefore the structure at 
the first-stratum (primary-ability] level, per 
se, cannot be tested. 

The heart of the theory lies in the 
broader, second-stratum abilities. These are 
shown in the left-hand column of Table 4.3. 
Reasoning abilities fall under Gf, while 
verbal comprehension and general knowl¬ 
edge abilities fall under Gc. The second- 
stratum abilities are themselves correlated. 
This gives the overall test the structure 
shown in Figure 4.8. All the second-stratum 

40 Alfonso, Flanagan, & Radwan, 2005. 

41 McGrew, 2005. 
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Table 4.3. A three-stratum grouping of primary abilities (middle and right-hand columns) 
into secondary abilities (left- and right-hand columns) in the three-stratum theory. The 
table is not complete, although the major secondary abilities are shown. 

Second Stratum Subsumed Primary 

Ability Abilities Brief Description 


Fluid intelligence (Gf) 


Crystallized intelligence 
(Gc) 


Deductive reasoning 
Inductive reasoning 
Quantitative reasoning 

Linguistic development 

Lexical knowledge 

Information about 
culture 


The ability to solve novel problems. 

The ability to reason from established 
principles or facts. 

The ability to detect patterns in 
observations. 

The ability to reason using numerical and 
mathematical arguments. 

The ability to apply previously acquired 
knowledge to current problems. 

The ability to follow an argument in one’s 
native language. 

The extent of one’s native-language 
vocabulary. 

Knowledge of the facts about one’s own 
culture. 


General domain-specific 
knowledge (Gkn) 

Geographic knowledge 
Mechanical knowledge 

Other tests of 
knowledge within 
various domains 

Visual-spatial ability (Gv) 

Visualization 
Spatial relations 
Closure speed 


Short-term memory 
[Gsm] 

Memory span 
Working memory 


Breadth of knowledge of facts and 
principles in specific domains. 


The ability to deal with structured visual 
stimuli. 

Ability to recognize and match visual 
stimuli. 

Ability to manipulate visual stimuli “in the 
mind’s eye.” 

The speed at which familiar visual stimuli 
can be recognized when obscured or 
hidden in other stimuli. 

The ability to apprehend and store 
information about the current situation. 

The ability to repeat back, verbatim, 
information just presented. 

The ability to execute cognitive processes 
on information held in short-term 
memory and to store the results. 


(continued) 
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Table 4.3 (continued) 

Second Stratum Subsumed Primary 

Ability Abilities Brief Description 


Long-term storage 
and retrieval (Glr] 

Associative memory 


Meaningful memory 


Free recall 


Cognitive processing 
speed (Gs) 

Pattern recognition 
Numerical facility 

Decision speed (Gt) 


The ability to store information for long 
periods of time and to retrieve it. 

The ability to recall a complex piece of 
information given a previously learned 
association to it. 

The ability to store, retain, and recall 
meaningful pieces of information, 
especially biographical information. 

The ability to recall, without cues, a long 
list of previously presented pieces of 
information, where each piece is unrelated 
to the others. 

The ability to execute easy, highly 
over learned cognitive tasks. 

Quick recognition of familiar perceptual 
patterns. 

The ability to execute simple arithmetic 
operations quickly. 

Speed of responding in simple decision 
situations. 


Choice reaction time Choosing which of a small number of 

pre-defined signals has been presented. 

Semantic processing speed Deciding whether a string of letters is or is 

not a word, e.g., CAT vs. TAC. 

Mental comparison speed Time required to compare two familiar 

stimuli on some attribute, e.g., deciding 
whether the symbols 'A' and 'a’ name the 
same letter. 


Note: This table is based on a larger table presented by McGrew (2005), Table 8.3. 


abilities have loadings on the single third- 
stratum factor ; g. 

The WJ-III was standardized on a sample 
of just under 9,000 examinees, of varying 
ages, who were chosen to approximate the 
distribution of a number of demographic 
variables (sex, age, ethnicity] in the United 
States. Table 4.4 presents the g loadings for 
several of the second-stratum tests, sepa¬ 
rately for different age groups. 42 The load¬ 
ings are both high (in the .80 range and 
above] and remarkably stable over adult¬ 
hood. The very high loadings of Gc and Gf 

42 McGrew & Woodcock, 2001, Appendix F. 


on the g factor imply a correlation between 
Gc and Gf in the .80-85 range. 

The high loadings of second-stratum abil¬ 
ities on g have led some investigators to 
argue that the Gf-Gc distinction is a need¬ 
less complication. For instance, Gustafsson 
has reported a study of over 1,000 Swedish 
sixth-graders in which the data was fit by 
a three-stratum model, with three broad- 
level abilities, Gf, Gc, and Gv. Gf had a 
loading of 1 (identity!] on the third-stratum 
g factor. The Gv and Gc loadings were 
.80 and .76, respectively. 4 * Therefore, the 

43 Gustaffson, 1984. 
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Table 4.4. The g loadings of the second-stratum factors on the WJ-III test, omitting Ga 
(auditory ability). Primary abilities are as shown in Table 4.3. 


Broad Ability 

Age 6-8 

Age 9-13 

Age 14-19 

Age 20-39 

Age 40+ 

Gc 

■79 

.86 

■ 9 ° 

■ 9 1 

.92 

Gf 

.96 

.89 

• 9 2 

.92 

•94 

Gq 

•73 

•74 

• 7 i 

.66 

.85 

Gv 

.85 

.68 

•77 

•79 

.85 

Glr 

.80 

•75 

.80 

•95 

.89 

Gsm 

.96 

• 9 1 

.83 

.86 

.92 

Gs 

.61 

•49 

■49 

■ 5 1 

•75 


Source: McGrew & Woodcock, 2001, Appendix F. 


expected correlation between Gf and Gc 
would simply be the Gc loading, .80. This 
result contrasts with an earlier study, using 
rather different factor analytic techniques, 
in which correlations as low as .17 between 
Gf and Gc were reported. 44 I am somewhat 
at a loss to explain this difference. Taking 
the literature as a whole, it does appear 
that in the majority of cases fluid and crys¬ 
tallized intelligence are substantially corre¬ 
lated. But does this mean that they are both 
expressions of an overarching general intel¬ 
ligence, g? 

4.4.2. The Nature of g, Gc, and Gf 
in the Three-Stratum Model 

In the three-stratum model the second- 
stratum factors fall into three groups: the 
two cognitive factors, Gc and Gf; the short- 
and long-term memory factors; and factors 
relating to different sensory modalities, 
notably Gv (visual) and Ga (auditory). 
Almost any test involving cognition will 
involve Gc and Gf, to some degree. How¬ 
ever, a battery might emphasize one or 
the other type of cognition. Whether or 
not memory or sensory modality factors are 
found depends upon the particular batteries 
used. 

Two studies illustrate this point. Gustafs- 
son’s previously cited study of Swedish mid¬ 
dle school children utilized a battery of thir¬ 
teen tests. He recovered three second-order 


factors - Gc, Gf, and Gv. Only two of his 
tests involved auditory presentations, and 
they were both memory span tests. There¬ 
fore, a factor involving these two tests, only, 
could be referred to as either a short-term 
memory or auditory skill factor. Gustafsson 
understandably chose to interpret variation 
in performance as due to short-term mem¬ 
ory ability, as the only auditory component 
of the test involved hearing and remem¬ 
bering words or numbers. The composition 
of Gustafsson’s battery contrasts with the 
extensive auditory tests (e.g., pitch discrim¬ 
ination) used in investigations that have suc¬ 
cessfully searched for Ga. 45 These differ¬ 
ences certainly do not invalidate or weaken 
Gustafsson’s study. They do drive home the 
point that how many human abilities are 
found depends upon how broad a range of 
behaviors is studied. 

The second example involves a case that 
illustrates a common practice. Many inves¬ 
tigators have extracted a general factor 
from tests that were designed for practical 
rather than research applications, such as the 
ASVAB and the SAT. Are the general fac¬ 
tors derived from these tests reflections of 
the cognitive trait that is revealed by anal¬ 
yses of battery-type intelligence tests, such 
as the WAIS and WJ? According to the g 
model, they should be. Jensen, in particu¬ 
lar, has argued that the identification of g is 
almost indifferent to the indicators used in 
a battery 46 


45 Horn & Stankov, 1982; McGrew & Woodcock, 2001. 
44 Hakstian & Cattell, 1978. 46 Jensen, 1998, p. 91; 2002. 
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From the viewpoint of the three-stratum 
model this is simply not accurate, as the fol¬ 
lowing example shows. 

The ASVAB was designed for a spe¬ 
cific purpose, predicting performance as an 
enlisted person in the US military services. 
Some of the tests used in the ASVAB are 
combined into a general index, the AFQT. 
[See Chapter 2 for a more complete descrip¬ 
tion.) A general factor that is highly corre¬ 
lated with the AFQT can be extracted from 
the ASVAB. 47 But is it g? 

In their widely publicized Bell Curved 
Richard Herrnstein and Charles Murray cor¬ 
related the AFQT with a variety of indica¬ 
tors of social behavior, ranging literally from 
education to divorce rate to salaries, and 
offered their results as evidence that gen¬ 
eral intelligence has a great deal of influence 
upon success in our society. We will discuss 
these results in some detail in Chapters 10 
and 11. Herrnstein and Murray, and Jensen, 
assumed that g, as revealed by the AFQT, 
is equivalent to a general intelligence factor 
uncovered by other test batteries. Extending 
Gustafsson’s argument, this could be called 
either g or Gf. 

Richard Roberts and his colleagues at 
the Educational Testing Service tested this 
assumption. They combined scores on the 
ASVAB with scores on tests designed to 
measure Gf and Gc separately, and found 
that the general factor on the ASVAB is a 
measure of Gc rather than Gf. 49 This is not 
a small point. Virtually everyone acknowl¬ 
edges that Gc is responsive to education, 
while Gf is less responsive. Regarding the 
AFQT as a measure of Gc rather than a mea¬ 
sure of g or Gf increases our optimism about 
the prospects of improving society through 
education. 

Studies such as Roberts and colleagues’ 
bring home the importance of a method¬ 
ological point made in section 4.2. A fac¬ 
tor is not an invariant property of a test. A 
factor is a statistic produced by the inter¬ 
action between the abilities evaluated in a 

47 Ree & Earles, 1991. 

48 Herrnstein and Murray, 1994. 

49 Roberts et al., 2000. 


test battery and the distribution of those 
abilities in the sample being tested. The 
fact that g, Gf, and Gc keep reappearing 
shows that these traits are somewhat invari¬ 
ant across different situations. That is infor¬ 
mative, for it suggests that they are indeed 
basic dimensions of variation in human cog¬ 
nition. Blithely accepting the ubiquity of g 
can produce misleading generalizations. 

4.4.3. What Is a Natural Kind; g, Gf 
or Gc? 

A natural kind is a phenomenon that exists 
in nature and is to be discovered. This 
contrasts with an artifactual classification, 
which is constructed by human thought. 
The distinction between men and women is 
a natural kind; the distinction between legal 
and illegal residents of the US or the Euro¬ 
pean Union is an artifactual one. Artifactual 
classifications can be quite useful in some 
settings. Nevertheless, they are the results 
of human action, not the action of a law of 
nature. 

In studies of intelligence the broad sen¬ 
sory modality and memory factors are natu¬ 
ral kinds, for they are defined as individual 
differences in human biological capacities. 
The distinctions among Gc, Gf, and g are 
debatable. 

The Gf-Gc distinction is based upon 
an individual’s social and cultural history. 
Consider how the following problem might 
be solved: 

Two freight trains approach each other on 
a single track. Between them there is a 
brief side route capable of holding only one 
engine or box car. (See Figure 4.9.J How 
can the two trains pass each other? 

Most readers will see this as a novel prob¬ 
lem, and solve it using a mixture of “reason¬ 
ing about new problems” [Gf) and manipu¬ 
lation of visual images [Gv). In fact, it is a 
problem that occurs often in railroad switch¬ 
ing yards, and has a standard solution. 5 ° The 
railroad-experienced reader will treat the 
problem as an exercise in Gc. 

50 Hayes, 2007. 
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Figure 4.9. The railroad passing problem. Two trains approach each 
other. There is a side route that can hold only one engine or boxcar. 
Is it possible for the trains to pass each other and, if so, how? 


Test developers get around such issues 
by restricting tests of Gc to tasks that draw 
upon “generally culturally accepted knowl¬ 
edge.” This automatically restricts most Gc 
tests to the industrially developed coun¬ 
tries, although the concept of “cultural 
knowledge” applies equally well to, say, a 
nomadic or hunter-gatherer culture. How¬ 
ever, it would have to be tested using ques¬ 
tions appropriate to the hunter-gatherer cul¬ 
ture. A test of Gc in one culture could be a 
test of Gf in another. This is an important 
distinction, for the two types of intelligence 
make different demands on our information¬ 
processing capacities. 

In order to solve a novel problem you 
must develop a way of representing it men¬ 
tally. This can be a difficult task, involving 
the development of information structures 
to be held in short-term (“working”) mem¬ 
ory. In order to solve a problem by apply¬ 
ing previously acquired knowledge you must 
have had appropriate experiences and coded 
them in long-term memory in such a way 
that they are accessible in the present con¬ 
text. Working and long-term memory draw 
on different brain processes and structures. 
This argument suggests that Gc and Gf are 
two different natural kinds of ability. 

This argument is not vitiated by demon¬ 
strations that g exists as a statistical phe¬ 
nomenon. Cattell argued that we acquire Gc 
very largely by using Gf to discover appro¬ 
priate problem-solving techniques, a process 
that he referred to as investing Gf in the 
acquisition of Gc. If two people enter into 
some experience that is unfamiliar to both 
of them, the one with the higher Gf will 
learn more from the experience, and end 
up with a higher Gc than the other per¬ 
son. Repeated throughout life, this would 


produce a correlation between Gf and Gc, 
and hence statistical evidence for g even 
though there is no natural kind that corre¬ 
sponds to general intelligence. 

The contrary argument, made most 
notably by Jensen, is that there are pervasive 
individual differences in brain processes that 
cause general competence, and that these 
differences are causes of differences on both 
Gc and Gf tasks. 51 There is no way to decide 
between the Cattell and Jensen positions 
from an analysis of the psychometric data 
alone. There is an important nonpsychome¬ 
tric argument in favor of Cattell’s proposal. 

Experimental psychologists show that 
two underlying human capacities are dif¬ 
ferent by showing that they respond to a 
change of conditions in different ways. For 
instance, one of the strongest pieces of evi¬ 
dence for a distinction between conscious 
and unconscious memory systems is that 
conscious memory retrieval falters when 
a person is distracted, while unconscious 
retrieval does not. 52 

This sort of argument can be used to dis¬ 
tinguish between Gc and Gf, using a uni¬ 
versally occurring experimental condition, 
aging. Performance on tests that mark for 
Gf (e.g., matrix problems) declines with 
age over the adult life span. Performance 
on tests that mark for Gc (vocabulary tests 
and tests of cultural knowledge) does not. 
In fact, measures of Gc may increase until 
advanced old age. 53 This provides striking 

51 Jensen, 2006. 

52 Jacoby, Toth, & Yonelinas, 1993. 

53 Horn, 1985,1986. Horn argued that the decline of Gf 
began as early as the late 1920s. Horn relied mainly 
on data from cross-sectional studies. This may have 
led him to overestimate the decline, because it has 
since been shown that there are cohort effects on Gf 
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Figure 4.10. Vernon’s structure of intelligence 
model. The model contains a g factor and 
two orthogonal group factors - the 
verbal-educational factor and the 
perceptual-mechanical skill factor. 

evidence that Gf and Gc are indeed two sep¬ 
arate processes. We look at this evidence in 
more detail in Chapter 11. 

The body of evidence favors the three- 
stratum theory over a simple general intelli¬ 
gence model. However, a revision of g the¬ 
ory ; the g-VPR model, deals with the evi¬ 
dence even better. 

4.5. Johnson and Bouchard's 

g-VPR Model 

By 2000 the three-stratum model had been 
widely accepted. Then, in 2005, two Univer¬ 
sity of Minnesota scientists, Wendy John¬ 
son and Thomas Bouchard, Jr., published 
new analyses that both questioned the three- 
stratum model and offered an alternative. 54 

Johnson and Bouchard began with a 
model of intelligence that had been pro¬ 
posed by the Canadian psychometrician 
Philip E. Vernon in the 1960s, 55 but then 
almost forgotten. Vernon himself barely 
mentioned his model in a 1979 book on 
heredity, environment, and intelligence! 56 

marker tasks. These are discussed in more detail in 
Chapters 10 and 11. For the point being made here it 
does not matter whether the cause of the change in 
Gf and Gc scores is due to aging per se or due to 
cohort effects. The important finding is that the two 
abilities change differently over time. 

54 Johnson & Bouchard, 2005a. 

55 Vernon, 1964, 1965. 

56 Vernon, 1979. 


Vernon’s structure of intelligence model is 
shown in Figure 4.10. It contained three fac¬ 
tors: a general factor (g) and two factors 
orthogonal to g. One he identified as a ver¬ 
bal: educational factor, reflecting the empha¬ 
sis upon verbal skills in the educational sys¬ 
tem, and the other as a perceptual:motor 
factor representing skill in identifying and 
manipulating objects. Vernon also suggested 
the presence of a third special factor, math¬ 
ematical skills, but felt that this was related 
to the perceptual:motor factor. 

Johnson and Bouchard proposed a four- 
stratum model, with the structure shown in 
Figure 4.11. The first, bottom, level consists 
of the primary traits evaluated by individual 
tests, such as a test of the ability to do sim¬ 
ple computations or to solve anagrams. The 
second stratum consists of broader but still 
fairly narrow abilities. For instance, at this 
level there is a distinction between word 
fluency, which is essentially a measure of 
speed of producing verbal associations, and 
verbal comprehension (their term, "verbal”), 
which is characterized by vocabulary and 
the understanding of proverbs. A similar 
distinction was made between memory for 
meaningful material and memory for arbi¬ 
trary, experimenter-presented associations, 
such as arbitrary lists of number-noun pairs. 
In all the data sets analyzed they found sub¬ 
stantial correlations between second-level 
factors, which indicated a need for a third 
stratum, in which the number of factors 
would be reduced, and where a second-level 
factor could have loadings on more than one 
third-order factor. 

The third stratum, which is the heart of 
the model, contains three factors - Vernon’s 
verbal and perceptual skills factors and a 
third "perceptual” ability, the ability to envi¬ 
sion motion of a static figure, most clearly 
seen in tasks that require rotation of a visual 
figure “in the mind’s eye.” As was the case for 
Vernon’s model, the Johnson and Bouchard 
model does not contain a memory factor. 
This is consistent with research in cognitive 
psychology, which has shown that there are 
many different types of memory. 

Johnson and Bouchard’s third-level fac¬ 
tors were highly correlated, indicating a 
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Narrow verbal skills, e.g., 
verbal comprehension. 



Figure 4.11. The structure of Johnson and Bouchard’s VPR model. 
In the interests of clarity, the figure shows a hierarchy. However, 
the model is better described as having a lattice structure, for 
individual tests may have loadings on more than one 
second-stratum factor, and second-stratum factors may have 
loadings on more than one third-stratum factor. 


need for a fourth stratum. They found 
that they needed only one factor at this 
level, which they identified as general intel¬ 
ligence, g. 

Developing an acronym from the names 
of the third-level factors, Johnson and 
Bouchard refer to their model as the g-VPR 
model. 

4.5.1. Psychometric Evidence 
for the g-VPR Model 

Johnson, Bouchard, and their colleagues 
have presented two arguments for prefer¬ 
ring the g-VPR model to either the g model 
(which it amplifies, rather than replaces) 
or the three-stratum Gf-Gc model. The 
first argument is based on psychometric 
evidence, the second on biological plausi¬ 
bility. We look first at the psychometric 
evidence. 


Johnson and her colleagues made a com¬ 
parative analysis of three different data sets, 
in which they compared the g-VPR model 
to the Gc-Gf model and Vernon’s model of 
orthogonal verbal and spatial abilities. 57 The 
first data set came from 400 adults who par¬ 
ticipated in the Minnesota Study of Twins 
Raised Apart. This is an extensive study 
of twins and their relatives, which will be 
described in detail in Chapter 8, panel 8.7. 
For the present purposes, we can think of the 
study as a large data base in which the same 
adults completed three different battery- 
type tests. The investigators then reanalyzed 
one of the data sets Thurstone had used to 
justify his primary mental abilities model. 
Thurstone's data set contained scores from 
a study done in Chicago in the 1930s. Some 

57 Johnson & Bouchard, 2005a,b; Johnson, te Nijen- 
huis, & Bouchard, 2007. 
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of the participants had taken sixty different 
tests. The third data set was based on 46 dif¬ 
ferent tests given to just over 500 seamen in 
the Dutch Navy in the 1960s. 

In all three data sets, the g-VPR model 
fit better than any of the other models. A 
comparison to the three-stratum model is 
particularly informative. Johnson and her 
colleagues subsumed Gc into a verbal fac¬ 
tor (V), identified the Gf factor with g, and 
split the Gv factor into their two perceptual 
factors, roughly the analysis of static visual 
images (P] and the ability to conduct mental 
manipulations of visual objects (R). Because 
none of the batteries that they considered 
contained tests involving auditory presenta¬ 
tions, they had no opportunity to uncover 
an auditory [Ga] factor. 

The V, P, and R factors were not orthog¬ 
onal, but, as would be expected, the cor¬ 
relation between the P and R factors was 
higher than the correlation between the V 
factor and the other two. While none of 
these samples can be claimed to be rep¬ 
resentative of a particular population, as 
opposed to the standardization samples for 
the WAIS, ASVAB, and WJ tests, one has 
to be impressed by the uniformity of the 
results, obtained over a wide variety of tests 
and using markedly different samples. 

Johnson and Bouchard did not find a need 
to identify broad memory factors, although 
they did identify some specific memory 
factors in the second stratum. This does 
not mean that memory is unimportant to 
cognition; obviously it is. What it means is 
either that memory factors are subsumed by 
one of the broad reasoning factors or that 
memory abilities are highly specific to the 
type of material being memorized. As we 
shall see in Chapter 6, there is evidence 
for subsumption of the ability to deal with 
short-term memories into g, and to view 
long-term memory abilities as being highly 
specific. 

Johnson and Bouchard regard their psy¬ 
chometric evidence as a disconfirmation of 
the three-stratum model. This is a strong 
statement. Whether you accept it or not 
depends upon your approach to statistical 
hypothesis testing. 


If one takes the classic approach to 
hypothesis testing, none of the models, 
including the g-VPR model, accounted for 
the data. In every comparison there was a 
“statistically significant” deviation of the data 
from that predicted by the Gc-Gf, three- 
stratum, original Vernon, or VPR model. 
Johnson and Bouchard took the relativis¬ 
tic approach of comparing the models to 
each other, using a sophisticated elabora¬ 
tion of Bayes' law. 58 This analysis identifies 
the best model within a set of models to 
be compared, rather than testing to see if 
there are significant deviations from a partic¬ 
ular model. The g-VPR model was the clear 
winner in this competition, although it did 
not account for all the data in an absolute 
sense. 

4.5.2. Logical Arguments 
for the g-VPR Model 

Johnson and Bouchard were also critical of 
the Gf-Gc distinction on logical grounds. I 
will present an expansion on their argument. 

Gc is supposed to represent the use of 
previously acquired knowledge to solve the 
current problem. But how are you to con¬ 
struct a test for this ability? It makes little 
sense to test a person's level of a skill unless 
the examinee has had a chance to acquire 
the skill. Therefore, tests of Gc have to be 
tests based upon information that is widely 
available in the examinee’s culture. Indeed, 
most Gc markers are of this nature. The 
vocabulary tests used as markers of Gc are 
roughly at the level of vocabulary used in 
television dramas. 

Suppose that tests have been constructed 
such that we can be certain that every exam¬ 
inee has been exposed to the information 
needed to do well on the test. Individual 
differences in test scores will then be pro¬ 
duced by differences in examinees' ability 
to extract this information from their com¬ 
mon experiences. A great deal of cultural 
knowledge is based upon induction from 

58 See Hunt, 2007, for a discussion of Bayes' law. The 

test used was the Bayesian Information Criterion 

(BIC) developed by Raftery (1995). 
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experience, rather than explicit instruction. 
This is especially true of our ability to under¬ 
stand the meaning of words in different con¬ 
texts. "How are you feeling?" spoken by a 
waitress at a restaurant is a different ques¬ 
tion, and requires a different response, than 
“How are you feeling?” spoken by a physi¬ 
cian in an emergency room. In order to 
understand such distinctions a person has to 
recognize patterns of usage. Pattern recogni¬ 
tion is, by definition, Gf, and Spearman's g. 

This muddles the Gc-Gf distinction. A 
test of Gc is either not fair, because it evalu¬ 
ates information to which the individual has 
not been exposed, or it is actually a disguised 
test of Gf. 

Johnson and Bouchard also point out 
another problem with the Gc-Gf distinc¬ 
tion, one that had actually been raised by 
Horn some time earlier. 59 If Gf is an ini¬ 
tial ability that is invested to produce Gc, 
then Gf measures should be more respon¬ 
sive to individual biological variables than 
Gc measures. In at least one case this is 
simply not what happens. The heritability 
coefficients for Gc and Gf are approximately 
the same, in violation of the argument that 
Gc reflects cultural experiences, while Gf 
does not. By contrast, heritability analy¬ 
ses (to be discussed in Chapter 8 in more 
detail) show a coherent pattern of genetic 
association for the variables in the g-VPR 
model. 60 

The g-VPR model aligns with well-known 
neuroscientific findings. The ubiquity of g 
suggests that there are individual differ¬ 
ences in some pervasive brain processes. 
Three candidate processes have been sug¬ 
gested: individual differences in the ability 
to control attention, individual differences 
in the speed and accuracy of neural con¬ 
duction, and individual differences in the 
plasticity of neural connections, all of which 
would affect the ability to acquire and 
retrieve information. Language processing 
and perceptual processing are carried out 
by separate brain systems. At least one bio¬ 
logical distinction mirrors the distinction 

59 Horn, 1998. 

60 Johnson et al., 2007. 


between mental rotation and the analysis of 
static figures. There are large male-female 
differences, in favor of males, in the ability 
to conduct rotation-like tasks. 61 The male- 
female differences in other perceptual tasks 
are much smaller, and in some cases are in 
favor of women. 

All in all, Johnson and Bouchard make a 
persuasive case for their model. 

4.6. Summary and Evaluation 
of Psychometric Theories 

Which psychometric theory is correct? 

Any psychometric theory of intelligence 
has to account for the degree to which posi¬ 
tive manifold exists. People who do well (or 
poorly) on a test of one type of cognitive 
skill generally do well (or poorly) on tests 
of other cognitive skills. This tendency is 
stronger at the bottom than at the top; poor 
performance on test A is a better predictor 
of poor performance on test B than good 
performance on A is of good performance 
on B. There is very little, if any, evidence 
for a negative correlation between cogni¬ 
tive skills. People who are highly competent 
with language may not be as good at visual 
or mathematical tasks as they are at verbal 
tasks, but they are unlikely to be bad at those 
tasks. The opposite reasoning holds. Math¬ 
ematicians and physical scientists are gener¬ 
ally not novelists, although there have been 
a few exceptions, but they are very unlikely 
to be functionally illiterate. 

These facts indicate that a theory of intel¬ 
ligence has to include something like g. 
But g, alone, is not enough. The debate is 
over the appropriate structure of the broad, 
but not completely general, abilities that lie 
below g in the structure of intelligence. 

The g, three-stratum, and g-VPR mod¬ 
els can all account for the psychometric 
data in a general way. If we look at rela¬ 
tive accuracy, the extensive data gathered by 
Johnson and her colleagues strongly indicate 
that the g-VPR model is a statistical winner. 
However, statistical criteria are not the 

61 Halpern, 2000. 
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only criteria by which a theory should be 
judged. Two other issues are relevant: the 
utility of a theory as a decision aid and 
the extent to which a theory at one level, 
here a theory of psychometric data, fits into 
theories and facts at a more basic level, here 
information-processing and biological mod¬ 
els of thought. 

In many personnel classification and 
training situations the Gc-Gf model has a 
great deal to recommend it. It makes sense 
to distinguish between candidates who do 
not know but can learn (low Gc, high Gf) 
and those who already have the requisite 
knowledge (high Gc]. It is also worth not¬ 
ing that in industrial and military personnel 
screening the issue is usually whether a per¬ 
son has a particular cognitive capability, not 
how he or she came to have it. 

Both the Gc-Gf and g-VPR distinctions 
can be fit into information-processing mod¬ 
els. The Gc-Gf model matches the distinc¬ 
tion between problem solving based on the 
manipulation of working memory and prob¬ 
lem solving based on retrieval of previously 
acquired information. The g-VPR model 
matches the distinction between brain struc¬ 
tures involved in working memory and the 
control of attention (i.e., the g and Gf com¬ 
ponents of the models can be identified with 
the same brain structures]. The VPR com¬ 
ponents of the g-VPR model are closely tied 
to pathways of sensory information process¬ 
ing. As Carroll noted, the Gc component 


of the three-stratum model is very closely 
associated with verbal reasoning. 62 

While g implies positive manifold, so do 
the other models. Positive manifold could 
be produced by the development of separate 
modules of cognition that had positive influ¬ 
ences on each other’s development. Cattell 
and Horn’s idea that Gf is invested in learn¬ 
ing to produce Gc, creating positive mani¬ 
fold as an epiphenomenon, is an example of 
this sort of argument. 

My conclusion is that a pure g model 
is too simplistic, although useful in some 
situations. Whether one should favor the 
g-VPR or three-stratum model depends a 
good bit on what the theory is supposed to 
do - provide a starting place for a theory 
that connects psychology to biology, or pro¬ 
vide a model for using human intelligence in 
academia and the workplace. 

Saying that the way you describe the sit¬ 
uation depends upon the problem you want 
to solve is more in the spirit of engineer¬ 
ing than of pure science. Both are legitimate 
worldviews. 

And there is another worldview. All of 
these models are attempts to find structure 
in the data from conventional cognitive test¬ 
ing. In the next chapter we look at theories 
that attempt to expand theories of cognition 
beyond the data obtained from the “Drop in 
from the Sky” paradigm. 

62 Carroll, 1993. 


CHAPTER 5 


Taking Intelligence Beyond 
Psychometrics 


Newton said that if he had seen further it 
was because he had stood on the shoulders 
of those who had preceded him. In 
Psychology we stand on their faces. 

The late R. C. Bolles, Professor 
and historian of Psychology, the 
University of Washington, personal 
communication 


Walter Lippmann had more to say about 
intelligence than was quoted in Chapter 1. 

[The intelligence test] does not weigh or 
measure intelligence by any objective stan¬ 
dard. It simply arranges a group of people 
in a series from best to worst by balanc¬ 
ing their capacity to do certain arbitrarily 
selected puzzles, against the capacity of all 
the others. 

Lippmann, 1922b 

This criticism strikes at the heart of psy¬ 
chometric theory. Lippmann believed that 
“real” intellectual talent has little to do with 
a talent for test taking. His criticisms have 
been echoed, with surprisingly little change, 
in today's world. 


The gist of the modern criticisms of intel¬ 
ligence testing is that the tests capture a very 
narrow slice of human cognition. The fol¬ 
lowing quotes are from books written for 
the general public: 

Almost everything you know about intelli¬ 
gence - the kind of intelligence psychologists 
have most often written about - deals with 
only a tiny and not very important part of 
a much broader and more complex intel¬ 
lectual spectrum. 

Sternberg, 1996, p. 11 

The score on an intelligence test does pre¬ 
dict one's ability to handle school subjects, 
though it foretells little of success in later 
life. 

Gardner, 1983, p. 3 

The evidence does not warrant such skep¬ 
ticism. In fact, perhaps because of a per¬ 
ceived need to keep things simple in popular 
books, these quotes are oversimplifications 
of the authors’ opinions. Both Gardner and 
Sternberg are considerably more circum¬ 
spect when they write papers addressed to a 
professional audience. 
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Nonetheless, the public clearly wants to 
hear the attacks on testing. Why? 

Possibly one of the strongest reasons is a 
gut feeling that tests with at best a tenu¬ 
ous resemblance to real life problem solv¬ 
ing should not predict how well people can 
solve important everyday problems. This 
was the basis of Lippmann’s objection to 
the first Stanford-Binet test battery. Imagine 
what Lippmann would have to say about a 
Progressive Matrices test] 

Distrust is magnified by a certain amount 
of defensiveness. Burt Green, a respected 
psychometrician at Johns Hopkins Univer¬ 
sity, once told me that people think that a 
fair question is one that they can answer. 
There is something to this. Sternberg opens 
one of his popular books with a diatribe 
about how he, personally, did poorly on an 
intelligence test. 1 His anecdote will resonate 
with anyone who feels that their test score 
does not reflect their true mental capability. 

Anecdotes about how tests failed to pre¬ 
dict outcomes are very common. There are 
many stories of people who did well despite 
their IQ, and a few stories of people (whom 
the story teller usually did not know or 
did not like) who did poorly even though 
they had high IQs. I have never met anyone 
who complained that their IQ score over¬ 
estimated their ability. 

Those defending testing reply The plu¬ 
ral of “anecdote” is not “data” No one has 
ever claimed that IQ scores are either a 
perfect predictor or the only predictor of 
social outcomes. Depending upon the sit¬ 
uation and the statistical assumptions one 
wants to make, the correlations between 
the trait underlying cognitive test scores and 
either academic or industrial performance 
are on the order of .5 or even higher. (See 
Chapter 10 for elaboration.) Cognitive tests, 
in some form, are given to literally mil¬ 
lions of people annually, so there is plenty 
of room for anecdotes about how a test 
did not predict for a particular individual, 
even though the test is a valid predictor for 
the population as a whole. The argument is 
valid, but it will resonate more strongly with 

1 Sternberg, 1996, p. 13. 


members of the American Statistical Asso¬ 
ciation than with members of the general 
public, for statistical reasoning is very much 
an acquired skill. 2 

Another reason that people may dis¬ 
trust the tests is due to valid but limited 
personal experiences. Interviews with “suc¬ 
cessful people,” including managers, reg¬ 
ularly produce assertions that personality 
variables, such as self-discipline and open¬ 
ness to experience, are more important than 
intelligence. This contradicts the evidence, 
for in fact the correlations between indices 
of success in the industrial world and per¬ 
sonality variables are about half the correla¬ 
tions between success and intelligence test 
scores. 3 Why the discrepancy? 

People’s personal social circles are highly 
selective. College graduates generally asso¬ 
ciate with other college graduates; industrial 
workers talk with other industrial work¬ 
ers. Intelligence plays a role in determining 
the people we know. Therefore the extent 
of variation in intelligence among one's 
acquaintances is likely to be smaller than 
the variation in intelligence in the general 
population. Providing that we avoid out¬ 
right psychopathology, there is no compa¬ 
rable restriction on variations in personality 
traits. Intelligence may have less influence 
over behavior than personality variables do, 
within each person’s own social circle, but 
more influence over behavior than personal¬ 
ity variables do, when considered across the 
entire society. 

Finally, two philosophic biases may lead 
some people, especially social liberals (the 
dominant social philosophy among aca¬ 
demic writers), to downplay the concept 
of intelligence, as defined by psychometric 
tests. 

Test scores are correlated with famil¬ 
ial socioeconomic status (SES); the higher 
your family’s SES, the higher your test 
score is likely to be, and vice versa. There¬ 
fore, there is concern that using test scores 
in personnel selection, and especially as a 

2 Amsel, Langer, & Loutzenhiser, 1991; Gigrenzer 

et al., 2007. 

3 Hunt, 1995b. 
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criterion for admission to higher education, 
will reinforce the existing social order. The 
concern is somewhat ironic, because one of 
the purposes of adopting cognitive tests as 
screening devices was to reduce the hold 
that the upper social class had on access 
to higher education! 4 Nevertheless, the con¬ 
cern is valid. If test scores alone were to 
be used in admitting people to educational 
institutions, there would be a tendency to 
select applicants from families with moder¬ 
ate to high SES. The extent to which one 
sees this as a problem depends upon one’s 
philosophy about the purpose of higher edu¬ 
cation, a topic that is far beyond the scope 
of a discussion of intelligence. 

The second philosophic bias is closely 
related to the first. There are differences 
between demographic groups on virtually 
all cognitive test scores, including the SAT, 
ACT, and ASVAB. In the United States 
Whites and Asian-Americans generally have 
higher scores than African Americans and 
Latinos. The gap in test scores mirrors the 
gap in various measures of socioeconomic 
status, such as income and health statistics. 5 
If you accept the test scores as valid indi¬ 
cators of intelligence, in the broad sense, 
then you may appear to have accepted dif¬ 
ferences in intelligence as a partial explana¬ 
tion for the gap in the socioeconomic statis¬ 
tics. This important and very complex issue 
is discussed in detail in Chapter 11. 

Wanting something to be so, even for the 
best of reasons, does not make it so. What 
sort of evidence would make us want to 
either replace or expand the model of intel¬ 
ligence developed from psychometric stud¬ 
ies? In order to answer that question we have 
to consider how intelligence, in the concep¬ 
tual sense, rather than intelligence in the 
sense of test scores, influences behavior. This 
requires an amplification of the argument 
presented earlier, in Chapter 1. 

The problem is illustrated diagrammati- 
cally in Figure 5.1. What we are ultimately 
interested in is how intelligence produces 

4 Lemann, 1999. See the discussion of the develop¬ 
ment of the SAT in Chapter 2. 

5 Herrnstein & Murray, 1994. 
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socially relevant behaviors. To investigate 
this we have to measure both intelligence 
and the behaviors. In the case of academics, 
we might want to know how intelligence, 
in the conceptual sense, determines what 
a person learns in school. This is a rela¬ 
tion between concepts. All we can mea¬ 
sure is the relation between test scores and 
grades, while admitting that the test scores 
do not perfectly measure intelligence and 
that grades do not perfectly measure what 
students have learned. A similar argument 
can be made for using test scores to evaluate 
workplace performance. 

Intelligence is not the only thing influ¬ 
encing grades, and for that matter it is not 
the only thing influencing test scores. Grades 
will be influenced by a variety of nonintel¬ 
lectual factors, such as self-discipline and 
interest. Conventional psychometric test 
scores can also be influenced by nonintellec¬ 
tual factors. Sternberg amplified upon his 
own example by claiming that his IQ test 
score was dramatically influenced, not for 
the better, by a definable personality charac¬ 
teristic called test anxiety, which means what 
the name implies, a tendency to become 
panicked and underperform when tested. 6 

To further complicate things, social 
behavior is not solely the product of the 
individual doing the behaving. Behaviors can 
be elicited or constrained by properties of 
the situation. Gardner has provided a com¬ 
pelling example, in a series of biograph¬ 
ical essays on creativity, as illustrated by 
such disparate figures as Einstein, Picasso, 
T. S. Eliot, and Mahatma Gandhi. 7 Each of 
the creative geniuses Gardner wrote about 
benefited from the support of people who, 
often at considerable sacrifice to themselves, 
played supporting roles so that the geniuses 
could concentrate on the work that, ulti¬ 
mately, made them famous. 

The following sections examine several 
attempts to expand traditional psychomet¬ 
ric models of intelligence. I will start with 
the simplest approach, expanding the con¬ 
ventional range of cognitive tests, and then 

6 Sternberg, 1996, Chapter 1; Sarason, 1980. 

7 Gardner, 1993b. 
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Figure 5.1. The relation between underlying traits, social constraints, 
test scores, and socially relevant behaviors. Test scores can predict 
but do not cause socially relevant behaviors. Both the behaviors and 
test scores are constrained by noncognitive variables. 


move to a discussion of personality and 
motivational issues. 

5.1. Gardner's Theory of Multiple 
Intelligences 

Howard Gardner's Multiple Intelligences 
(MI] model, presented in his 1983 book 
Frames of Mind? became instantly popular 
as an alternative to psychometric theories of 
intelligence. Frames of Mind was selected by 
five book clubs and translated into seven lan¬ 
guages. The theory has since received a few 
modifications, but the basic message has not 
changed .9 While Gardner has stressed the 
application of his ideas in grade school edu¬ 
cation, he has not hesitated to discuss possi¬ 
ble applications in other domains. 

8 Gardner, 1983. 

9 Gardner, 1993a, 1999; Chen and Gardner, 2005. 


As the name implies, MI theory is based 
on the assumption that there are many 
types of intelligence, ranging from linguis¬ 
tic skills to musical intelligence and bodily/ 
kinesthetic intelligence. Gardner regards the 
purpose of schooling to be the development 
of each child’s type of intelligence, without 
forcing all children to focus on developing a 
narrow set of linguistic and reasoning skills 
that, he feels, are needed much more in con¬ 
ventional schools than they are in the world 
outside school. While Gardner has made this 
contention in many forums, to my knowl¬ 
edge he has never offered evidence showing 
that the validity of IQ and similar educa¬ 
tional tests is confined to academics. 

Why did Gardner’s view become so pop¬ 
ular? One reason is that he presents an opti¬ 
mistic view of education. MI theory offers 
hope that the child who is not doing well 
in conventional subjects may have supe¬ 
rior talents in art, music, social interactions, 
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or even bodily movement. It follows, then, 
that it is the teacher's job to identify and 
develop those talents. This approach to edu¬ 
cation is morale-building for students, par¬ 
ents, and teachers. The students and parents 
are assured that there are ways to success 
for every child. The teacher is encouraged, 
for the theory stresses the teacher’s role as 
a diagnostician and facilitator. This is a far 
more professional view of teachers than the 
view that the role of the teacher is to trans¬ 
mit the assigned lesson to the class, en masse. 

By contrast, proponents of the general 
intelligence (g) model are often interpreted 
by educators as saying “some children are 
smart and some aren’t and there is not much 
that you, the teacher, can do about it.” 
Therefore, the teacher’s role is to transmit 
information, which either will or will not be 
picked up by the students, depending upon 
their pre-set intelligence. 

In fact, g theorists have never said this. 
What they have said is a somewhat subtler 
message: “Some children are smart and some 
are not. There is a great deal that you can 
do to raise (or lower) the average level of 
competence in a class, but there is little you 
can do to lower the fact of variation. The 
bright children will learn more than the not- 
so-bright children, and this is a fact that a 
teacher has to live with.” 

The MI model is far more egalitarian than 
the g model. According to it, if the cultural 
situation (i.e., the classroom) is appropri¬ 
ately shaped, every child can have his or her 
potentials drawn forth. Gardner tells teach¬ 
ers that they can make a difference for all 
children, providing that they structure the 
classroom so as to encourage the appropri¬ 
ate talent. 

It would not be logically contrary to any¬ 
thing Gardner has written to assert that even 
if there are many intelligences, there are 
some children who simply do not have much 
of any potential. However, that message is 
downplayed, and to some extent negated, by 
Gardner’s assertion that the different intel¬ 
ligences he has identified are not highly cor¬ 
related. This is an assertion about data, and 
can be evaluated - as we shall do. 
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Now let us take a closer look at the 
theory. 

5 . 1 . 1 . The Multiple Intelligences 

MI theory is based on two premises. The 
first is that there is no general trait of over¬ 
all mental competence. In Gardner’s view 
positive manifold is a statistical artifact that 
arises because conventional schools empha¬ 
size only a limited range of skills - language, 
abstract reasoning, and rapid responding - 
and because virtually all the tests are pre¬ 
sented using language. In Gardner’s view, 
psychometric studies look at intelligence 
through a “verbal lens.” 

The second premise is that there are, in 
fact, a variety of different intelligences, rang¬ 
ing from linguistic intelligence to bodily/ 
kinesthetic intelligence. But what are they? 
The way Gardner identified the dimensions 
of intelligence could not have been further 
from the way the psychometricians investi¬ 
gate the topic. He says of his preparation for 
his first book, in 1983: 10 

/ had always been intrigued by the chal¬ 
lenge and promise of examining human 
cognition through a number of discrete 
disciplinary lenses. I enjoyed investigating 
psychology, neurology, biology, sociology, 
and anthropology as well as the arts and 
humanities. And so I began reading sys¬ 
tematically in these areas in order to gain 
as much information as possible about the 
nature of various kinds of human faculties 
and the relationships among them. 

Gardner , 1999, p. 33 

This method of analysis, broad reason¬ 
ing followed by reflection, is quite differ¬ 
ent from the emphasis on data collection 
and analysis that characterizes psychomet¬ 
ric research. Gardner himself has observed 
that he has little concern for measurement. 11 
This makes his method of inquiry more akin 
to inquiry in the humanities than to scien¬ 
tific inquiry. That does not make the method 

10 Gardner, 1983. 

11 Gardner, 2006a,b. 
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wrong; humanistic inquiry is a legitimate 
method of investigation. It has lead Gard¬ 
ner to an updated definition of intelligence: 

A biopsychological potential that can be 
activated in a cultural setting to solve prob¬ 
lems or create products that are of value to 
society. 

Gardner, 1999, pp. 33-34 

This definition has been amplified by 
the identification of four criteria for an 
intelligence: 12 

Biological: The trait in question should 
be the product of a biological system, 
and therefore it should be isolatable 
by brain damage and should exhibit 
a development over evolutionary 
history. 

Psychological: The trait should be improv¬ 
able by specific training. Indicators of 
the trait should be correlated with 
other indicators of the same trait, but 
should have only a low correlation 
with indicators of other traits. 
Developmental: The trait should have a 
distinct developmental history, with 
a definable set of expert-level end- 
state performances. There should also 
be special populations who display 
unusual development of the trait, such 
as idiot savants (who have a single 
exceptional talent, but are otherwise 
somewhat or even profoundly below 
the norm] and prodigies. 

Logical: Each intelligence should have a 
definable core set of operations that 
can be expressed in a symbol system. 
The obvious case here is language, 
which is a symbol system whose struc¬ 
ture and use are determined by the 
rules of syntax, semantics, and prag¬ 
matics appropriate to the language and 
culture involved. Music and chore¬ 
ography would also qualify by these 
criteria. 

12 Chen & Gardner, 2005. 


Gardner uses these criteria somewhat 
loosely, for most but not all of his intelli¬ 
gences satisfy all the criteria. Here are the 
intelligences that he has identified: 

Linguistic intelligence: This refers to skill 
with verbal and written language. 
Writers and poets are prototypes of 
individuals possessing this skill. 

Logico-mathematical intelligence: Skill in 
mathematical, numerical, and abstract 
logical reasoning. Mathematicians and 
computer programmers are offered as 
prototypes. 

Spatial intelligence: A skill possessed by 
graphic designers and architects. It is 
the ability to perceive and manipulate 
spatial and visual images. 

Bodily/kinesthetic intelligence: The skills 
displayed by dancers and athletes. 
They deal with the ability to control 
bodily movements. 

Naturalistic intelligence: This skill is criti¬ 
cal for archeologists and botanists, two 
rather different fields. It involves skill 
in dealing with elements in the natural 
environment. 

Interpersonal intelligence: The skills poli¬ 
ticians have to have! The skills re¬ 
quired to deal with people, as opposed 
to ideas or things. 

Intrapersonal intelligence: Skill in under¬ 
standing and regulating one’s own 
emotions, strengths, and desires. 

These are certainly desirable traits, but is 
it useful to call all of them “intelligence”? 15 
Gardner himself has admitted that his use 
of the term “intelligence” was a calculated 
attempt to capitalize on what he saw as a 
tendency in Western culture to value men¬ 
tal competencies, rather than an attempt to 
further the study of what others had defined 
as “intelligence.” 14 Does grouping these var¬ 
ious traits together lead to a natural kind, a 
set of behaviors that have similar causes and 
lead to similar effects? 

13 Hunt, 2004. 

14 Gardner, 1999, p. 33. 
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5 . 1 . 2 . The Evidence for the Theory 
of Multiple Intelligences 

MI theory resonates with popular ideas 
about intelligence. The idea that there are 
multiple talents and that people can be high 
on one (e.g., artistic skill) while low on oth¬ 
ers (e.g., analytic skills) accords with pop¬ 
ular conceptions of the distribution of tal¬ 
ent. Surveys have consistently shown that 
the popular concept of intelligence, in both 
developed and undeveloped countries, is 
broader than the range of talents evaluated 
in tests. Social skills, in particular, are given 
more weight in the popular mind than in 
the classic testing paradigm. When people 
are asked to rate themselves and others on 
Gardner’s multiple intelligences, they see 
the task as reasonable. But when people are 
asked to give ratings of overall intelligence 
this is also seen as a reasonable task, and ver¬ 
bal, mathematical, and spatial skills are the 
biggest contributors to the overall ratings. 15 
Whether or not you get popular acceptance 
of the MI model or of something closer to 
the g-VPR model depends a good deal on 
how you phrase the question. 

In any case, popular opinion is hardly sci¬ 
entific evidence. We need a source of facts, 
not opinion. Evidence for the utility of a the¬ 
ory of intelligence comes from three sources: 
the theory’s utility in explaining displays of 
human abilities both in the school and in the 
workplace, and from psychometric studies 
intended to evaluate the theory. 

The MI model has been an easy sell to 
educators. They have developed a num¬ 
ber of educational programs based on the 
multiple intelligences, usually for the early 
childhood and primary school years. The 
prototypical program first assesses children, 
and then provides educational interven¬ 
tions designed to develop their strengths. 
The intervention is followed by a final 
assessment. Consistent with Gardner’s view, 
assessment is typically done by engag¬ 
ing children in a (hopefully) interesting 
task, rather than by formal testing. These 

15 Fumham, 2001. 


programs have been reviewed favorably by 
Gardner himself, 16 and with somewhat more 
reservation by some of the authors in an 
edited volume examining evidence for the 
theory. 17 

These studies show that children in kin¬ 
dergarten and the early elementary school 
years become more engaged when they par¬ 
ticipate in activities that suit their individ¬ 
ual talents, than when they must conform 
to activities chosen for a class as a whole. 
Not surprisingly, these talents are then fur¬ 
ther developed, for you learn to do what you 
practice doing. 

But how relevant is this to the develop¬ 
ment of intelligence? Or, for that matter, to 
the design of school programs? To answer 
these questions we would have to compare 
the later performance of children who had 
gone through one of the MI programs to the 
later performance of comparable children 
who had gone through a standard early edu¬ 
cation program. To my knowledge no such 
test has been reported. I also worry that if 
such an experiment did not provide support 
for the popular Multiple Intelligence model 
the report would suffer from the bias against 
reporting negative results. 

I doubt that negative results would have 
much influence, for Gardner and his sup¬ 
porters can offer two plausible explana¬ 
tions for them. Gardner claims that higher 
levels of instruction (e.g., middle school, 
high school, and college) are organized 
toward the development of traditional ver¬ 
bal and analytic skills. Therefore, the results 
obtained by training other intelligences dur¬ 
ing the primary school years would not 
be valued as children progressed up the 
educational pipeline. 18 A second reason for 

16 Gardner, 1993a; Cheng & Gardner, 2005. 

17 Schaler, 2006. 

18 A good case can be made for the argument that 
schools are oriented toward developing analytic and 
verbal skills, at the expense of developing skills in 
artistic and social endeavors. But is this a bad thing? 
Are schools supposed to develop children's varied 
talents, or are they supposed to train a workforce for 
a post-industrial society? Educators tend to take the 
first view. Many observers from outside the edu¬ 
cational field take the second view. The National 
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disregarding negative results is that a pro¬ 
gram designed to develop multiple intelli¬ 
gences may not have been properly orga¬ 
nized. Gardner has said that he approves of 
some implementations, and not of others. 
This offers him an out; if a Multiple Intel¬ 
ligences program does not show success, it 
must be because the program was improp¬ 
erly implemented. And of course this could 
be true. But no definition of proper imple¬ 
mentation has been given other than that the 
results should be in accord with the theory. 

As far as I know, there has never been a 
rigorous attempt to evaluate the Multiple 
Intelligence approach as a guide to selec¬ 
tion or evaluation of personnel in workplace 
settings. Indeed, the evidence is strongly in 
favor of the use of general measures of intel¬ 
ligence - that is, ^-loaded measures - as 
predictors of workplace performance. This 
is not to say that special skills are irrel¬ 
evant in certain extreme situations. Poets 
and airplane mechanics do different things! 
However, the advocates of MI have been 
primarily concerned with education. 

Three Canadian psychologists, Beth 
Visser, Michael Ashton, and Philip A, Ver¬ 
non, took a psychometric approach to the 
evaluation of MI theory. 19 They constructed 
two different behavioral tests for each of the 
eight intelligences. For example, linguistic 
ability was tested by a vocabulary test and 
by a test requiring people to identify a word 
that had the opposite meaning from a tar¬ 
get word. Social intelligence was assessed 
by a task in which participants saw sev¬ 
eral cartoon drawings depicting part of a 
story and then had to choose a final cartoon 
indicating a logical completion of the story, 
and by a task in which participants had to 
determine what a word or phrase meant, 
in context. In addition to tests that evalu¬ 
ated the intelligences specified by MI, par¬ 
ticipants took the Wonderlic Personnel Test 

Academy of Science (NAS) report Rising Above 
the Gathering Storm: An Agenda for American Sci¬ 
ence and Technology (2007) can be read in places as 
expressing concern that our system is training too 
many poets, authors, and business managers, and 
not enough engineers and scientists. 

19 Visser, Ashton, & Vernon, 2006. 


(WPT) test, which is a g-loaded test that 
stresses rapid responding to simple prob¬ 
lems. (See the description of this test in 
Chapter 2.) 

The 17 tests were given to 200 adult vol¬ 
unteers drawn from a university commu¬ 
nity. Participants were recruited from stu¬ 
dents in thirty different departments, thus 
maximizing the chance that the sample 
would include people with diverse sets of 
talents. 

The Visser and colleagues study stacked 
the deck in favor of the MI model. The par¬ 
ticipants were adults with different train¬ 
ing and interests, and were of above-average 
general intelligence. As noted in Chapter 4, 
greater differentiation of ability is typically 
found in above-average samples than in 
below-average samples. Nevertheless, the 
results did not support the model. The tests 
of what everyone would agree are cognitive 
abilities - the linguistic, mathematics, and 
spatial tests - had strong loadings on a gen¬ 
eral factor and substantial correlations with 
the WPT. So did the tests of natural intel¬ 
ligence, which evaluated a person’s knowl¬ 
edge about the biological world. This result, 
and other results in the literature, 20 are con¬ 
trary to Gardner’s frequent denial of the 
strength of the general factor in intelligence. 
The tests of noncognitive factors, such as the 
tests of bodily/kinesthetic intelligence, had 
low loadings on the general factor. This is 
consistent with the argument that combin¬ 
ing all these variables under the name “intel¬ 
ligence” forms a class of behaviors that is no 
longer a natural kind. 

A second result was also damaging to 
the MI theory. According to MI theory the 
correlations between different tests evalu¬ 
ating the same intelligence should be high. 
For instance, after accounting for the effects 
of general intelligence there should be a 
high correlation between the two linguis¬ 
tic tests, the two mathematical tests, and so 
on. In general, there was not. The two lin¬ 
guistic tests had a residual correlation, after 

20 Brody, 2006. See also Carroll’s (1993) discussion of 

lack of evidence for Gardner’s view of general intel¬ 
ligence. 
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removing the effects of g, of .29. None of 
the other seven correlations between tests 
of the same intelligence exceeded .18, and 
the average correlation was .10. 

In sum, this study provided no evi¬ 
dence whatsoever that MI theory actually 
describes the distribution of cognitive skills 
in college students. 

Gardner 21 dismissed these results, on two 
grounds. He disputed the appropriateness of 
some of the tests, and he rejected the idea 
that it is even appropriate to evaluate the 
theory using the standard psychometric test¬ 
ing paradigm. He concluded that Visser and 
her colleagues got the results that they did 
because they recreated the standard scholas¬ 
tic testing situation, with a strong emphasis 
on logical reasoning. 

Gardner’s rejoinder to Visser and her 
colleagues was typical of his reaction to 
attempts to assess his ideas. In his own 
words, 

As I have often explained except for Project 
Spectrum I have not devoted energies to the 
devising of tasks that purport to assess ML I 
have no objection to others doing so though 
the efforts so far have been modest. In my 
own experience, I have been impressed by 
efforts to create environments in which the 
use of multiple intelligences is highlighted. 

Gardner, 2006b, p. 504 

5 . 1 . 3 . An Evaluation of MI 

Despite the popularity of the MI idea in 
some educational circles, there is virtually 
no objective evidence for the theory. The 
attitude toward measurement that Gardner 
expressed in his reaction to Visser and her 
colleagues does not lead to good science. Sci¬ 
entific ideas are tested by their ability to 
account for data, and such tests cannot be 
carried out unless there is a commitment 
to means of measurement. Gardner cannot 
simply dismiss results such as those of Visser 
and her colleagues unless he offers an alter¬ 
native means of measurement. 

The fact that children can be enthusi¬ 
astic about a program that plays to their 

21 Gardner, 2006b. 


own task-assessed strengths is interesting, 
although fairly obvious. However, there 
appears to be little evidence, positive or 
negative, that early childhood training using 
Multiple Intelligences theory translates into 
good performance later on in the educa¬ 
tional process. One can blame the educa¬ 
tional process for this, arguing that the later 
educational stages stress formal, analytic rea¬ 
soning to the detriment of other intelli¬ 
gences. Whether such an attack is valid or 
not depends upon what one thinks formal 
education is for. (See footnote 18 for an 
expansion on this idea.) 

There is no evidence that expanding 
the notion of intelligence beyond obvious 
cognitive characteristics, to include such 
things as musical and bodily/kinesthetic 
“intelligence,” creates a natural kind. 
These noncognitive characteristics are all 
admirable talents. They can and should 
be studied separately. Forcing them under 
the umbrella of intelligence will lead to an 
unmanageable topic of study. It has been 
said that science proceeds when it defines 
tasks that carve nature at its joints. If this 
is true, Multiple Intelligence theory is not a 
map for scientific progress. 

Having said this, I close with a strong 
endorsement of one of Gardner's positions, 
as being useful for science, and a more qual¬ 
ified endorsement of another, as a way of 
thinking about education. 

Gardner makes a good point when he 
advocates expanding the evaluation of intel¬ 
ligence beyond the conventional psycho¬ 
metric testing paradigm. The field could 
definitely profit from more analyses of indi¬ 
vidual differences in performance of every¬ 
day tasks, in fields ranging from the evalua¬ 
tion of surgeons, mechanics, and lawyers, to 
the observation of schoolchildren. Indeed, in 
industrial-organizational psychology there 
is considerable concern that personnel 
assessment is limited by our ability to 
record details of on-the-job performance. 22 
Gardner deserves credit for highlighting 
the problem, although (possibly due to his 

22 S. Hunt, 2007. 
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disdain for assessment) he has not provided a 
solution. 

The qualified endorsement is that any 
humane person ought to agree with Gard¬ 
ner that schools should, insofar as possible, 
encourage students to develop their personal 
abilities and interests. I would like to see 
students given much more opportunity to 
develop talents in music, the performing 
arts, and even, for all students rather than 
just for selected stars, athletics. However, I 
also want students to be prepared for the 
advanced, post-industrial society that they 
are going to live in. In order to meet this 
goal schools have to stress the development 
of linguistic and mathematical skills. The 
extent to which other topics can be taught 
depends on how much time and money soci¬ 
ety is willing to invest in education. Defining 
the curriculum is a matter for educational 
policy, not for science. 

5.2. Robert J. Sternberg's Theory 
of Successful Intelligence 

The next theorist whose work will be dis¬ 
cussed, Robert J. Sternberg, has, like Gard¬ 
ner, attempted to expand our views of 
intelligence beyond psychometrics. Unlike 
Gardner, though, Sternberg’s research is 
grounded in previous research, and he has 
assiduously conducted experiments to sup¬ 
port and expand his ideas. 

Sternberg has been a strong advocate for 
the adoption of his theories in education 
and in industry. To that end he has writ¬ 
ten widely and made many presentations 
to nonspecialists, including both the gen¬ 
eral audience and policy makers. By con¬ 
trast, most developers of psychometric theo¬ 
ries have written mainly for other scientists. 
When evaluating Sternberg's work one has 
to consider both what he is saying, either 
about his own work or that of others, and 
the audience to whom he is saying it, either 
specialists or nonspecialists. 

I will first consider Sternberg's views on 
the work of others. This provides a motiva¬ 
tion for the development of his own theo¬ 
retical position. 


5.2.1. Sternberg's Criticism of Previous 
Work on Intelligence 

Sternberg has shown a great deal of aware¬ 
ness of previous work on intelligence. This 
is evidenced by his excellent review of the 
field as it stood in 199c. 23 In it he pre¬ 
sented a carefully balanced analysis of differ¬ 
ent theoretical approaches to intelligence, as 
they existed at that time. He was dissatis¬ 
fied with the field because of what he saw 
as its myopic view of the range of human 
intelligence. He also felt that an appropri¬ 
ate goal for intelligence research was to pro¬ 
vide useful information to guide educators 
and industrial leaders in the development of 
people's capacities, rather than simply cata¬ 
loging individual differences. To satisfy this 
goal he has written numerous books and 
articles directed to the general public and 
to educators. Some of these contain harsh 
criticisms of the field, i have already quoted 
from one of these books. Here is what he has 
to say on the next page of the same work. 

Almost everything you know about intelli¬ 
gence - the kind of intelligence psychologists 
have most often written about - deals with 
a tiny and not very important part of a 
much broader and more complex intellec¬ 
tual spectrum. 

Harsh words indeed] In his writings in the 
technical literature Sternberg takes a more 
measured tone. For instance, he has been 
very much involved in developing educa¬ 
tional programs that combine testing with 
educational programs that use the tests to 
tailor instruction to student strengths and 
weaknesses. 24 Here is what he has to say 
about the results of a large project involv¬ 
ing an attempt to improve the prediction 
of success in college (after citing several 
meta-analytic studies of prediction of col¬ 
lege performance, which produced valid¬ 
ity estimates for predicting freshman GPA 
ranging from .4 to .6): 

All together these results suggest good pre¬ 
dictive validity for the SAT for freshman 

23 Sternberg, 1990. 

24 Sternberg, Grigorenko, & Zhang, 2008. 
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college performance. But as is always the 
case for any test or type of test, there is 
room for improvement .... Thus the theory 
[of successful intelligence] does not suggest 
replacing , but rather augmenting the SAT 
in the college admissions process. 

Sternberg, 2006, p. 322; emphasis in 
the original text 

This is a correct analysis of the cur¬ 
rent situation, and a reasonable proposal for 
expanding the study of intelligence. What is 
the nature of the expansion Sternberg pro¬ 
poses? 

5 . 2 . 2 . The Theory of Successful Intelligence 

Because Sternberg has written so much, 
and because his thinking has (appropriately) 
evolved over his career, it is sometimes dif¬ 
ficult to assess just what his current posi¬ 
tion is. My comments will be based largely 
on the arguments presented in his book 
Practical Intelligence in Everyday Life, zt > pub¬ 
lished in 2ooo ; and subsequent related pub¬ 
lications. Practical Intelligence is a narrative 
account of research conducted over the pre¬ 
vious twenty years, and thus provides a 
good jumping-off place for an analysis of his 
ideas. 

Sternberg distinguishes three classes of 
intelligence: analytic (sometimes called aca¬ 
demic), creative, and practical intelligence. 
He believes that conventional psychomet¬ 
ric and personnel screening tests, such as 
the SAT and AFQT, are essentially mea¬ 
sures of analytic intelligence. The analytic 
intelligence tests that he and his colleagues 
have developed intentionally resemble con¬ 
ventional cognitive aptitude tests. They do 
not seem to have the predictive power of 
tests like the SAT and AFQT, but that 
could be because they are much briefer and 
because, after all, the conventional tests now 
in use have been refined over the course of 
decades. Sternberg claimed that any major 
expansion of our ideas of intelligence will 
require the development of tests of creative 
and practical intelligence. 

25 Sternberg et al., 2000. 


CREATIVITY 

There is a long history of trying to develop 
tests of creativity apart from tests of intelli¬ 
gence. This has proven to be a difficult task, 
for a simple reason. What is the criterion? 
It is easy to collect anecdotes about unde¬ 
niably creative individuals, especially those 
with very high levels of accomplishment. 
Albert Einstein is a frequently cited exam¬ 
ple. However, it is difficult to go further 
because highly creative people are rare. 

Dean Simonton, a professor at Univer¬ 
sity of California, Davis, who has made an 
extensive study of high levels of creativity, 
has pointed out that, unlike most human 
traits, creativity is not normally distributed. 
The majority of truly creative contributions 
are made by a very small fraction of the 
workers in any given field. (If a contribu¬ 
tion is commonplace it will probably not be 
considered creative.) Simonton also points 
out that creative contributions are usually 
specialized. This could be because the psy¬ 
chological traits required to be creative are 
specific to a particular field. There is no 
compelling argument that being a creative 
chemist and being a creative artist draw on 
the same traits. It could also be that only a 
tiny, tiny percentage of people are creative 
in more than one field because there sim¬ 
ply is not enough time. There is very little, 
if any, evidence for the existence of effort¬ 
lessly creative people, who can contribute to 
one field after another without working very 
hard in any of them. And finally, both cre¬ 
ative accomplishments and their recognition 
depend very much upon the social setting. 26 

These considerations make it difficult to 
draw generalizations from the study of peo¬ 
ple who we all agree are creative. Therefore, 
scientists who want to study creativity often 
wind up studying people who are believed 
to be creative in everyday life. This is not 
easy, either. Extrapolating from Simonton’s 
analysis, the difficulty may be that there are 
not that many such people. Or it might be 
that they are there, but are not motivated to 
reveal themselves, because we do not want 
to have to be creative on a daily basis. The 

26 Simonton, 1984. 
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world works much better if, most of the 
time, the problems that people encounter 
are ones that they know how to solve. A 
day on which none of the problem-solving 
methods you have learned works is a bad, 
frustrating day. 

This presents a problem for a test con¬ 
structor. Psychologists construct intelligence 
tests by identifying people who display dif¬ 
ferent levels of cognitive competence, as 
indicated by grades, supervisor ratings, and 
objective tests of on-the-job performance, 
and then designing tests that discriminate 
between performers and nonperformers. If 
creative behaviors themselves are rare or 
hard to identify, test constructors do not 
know where to begin. 

It is possible to measure a sort of creativity 
referred to as divergent thinking. Examinees 
are shown a situation or object, and asked 
to list as many ways the test stimulus can be 
used or interpreted as they can - for exam¬ 
ple, asking people to list the different ways 
you can use a brick. The response is then 
assessed by judges. Such tests are face-valid 
as measures of the ability to produce many 
and/or unusual responses. But does this have 
anything to do with the creation of creative 
products? More particularly, does a creativ¬ 
ity test requiring divergent thinking measure 
a useful trait that is not measured by intelli¬ 
gence tests? 

There is evidence that it does. E. Paul 
Torrance, a professor of psychology at the 
University of Georgia, developed a widely 
used series of creativity tests along the lines 
just described. During the 1960s they were 
given to elementary school children in two 
selected schools. The students were also 
given the Stanford-Binet intelligence test. 
Torrance’s participants were quite bright; 
the mean IQ was 120. Creativity was assessed 
by records of performance in adulthood, in 
some cases as much as forty years later. 
A modern analysis of Torrance's data 27 has 
shown that in this high-IQ sample creativity 
measures taken in childhood made a sub¬ 
stantial contribution to predicting lifetime 

27 Plucker, 1999. 


creativity, independently of predictions 
based on childhood IQ scores. 

Sternberg is on firm ground in calling for 
an expansion of intelligence testing to con¬ 
sider creativity. Would the effort be worth 
the expense? That is hard to say. Statisti¬ 
cally, the gains in prediction would probably 
be small. However, identifying creative peo¬ 
ple could be extremely important for eco¬ 
nomic and social reasons. 

PRACTICAL INTELLIGENCE 
Practical intelligence refers to the ability to 
deal with ongoing, realistic, and at times 
ill-defined problems. Sternberg has placed 
great stress on the importance of measuring 
this trait. He claims that tests of academic/ 
analytical intelligence are incomplete 
because they rely on abstract problems 
in which all necessary information is pre¬ 
sented. The practical intelligence problems 
that Sternberg and his colleagues have 
developed either ask for specific, task¬ 
relevant knowledge or describe situations 
claimed to be realistic, and ask examinees 
what they would do. 

Table 5.1 shows some examples of ques¬ 
tions that have been asked on tests of 
practical intelligence. Many of these ques¬ 
tions resemble current industrial personnel 
selection practices. Asking for an analysis 
of a hypothetical situation is a low-fidelity 
simulation of performance on the job. In 
industrial-organizational psychology, tests 
that do this are called situational judgment 
tests, and have been used for years. It is of 
interest that within industrial-organizational 
psychology they are considered to reveal 
personality traits, such as preferences for 
certain types of action, as well as contain¬ 
ing a cognitive component. 28 

Sternberg and his collaborators have 
stressed the importance of tacit knowledge 
as a component of practical intelligence. By 
this they mean knowledge that is not explic¬ 
itly taught but that is required in many sit¬ 
uations. Tacit knowledge is often procedu¬ 
ral knowledge, knowing what to do, rather 

28 S. Hunt, 2007, pp. 69-70. 
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Table 5.1. Examples of tasks used by Sternberg and his colleagues to assess practical 
intelligence 

Setting Item Type Item Reference 


Evaluating 
knowledge of folk 
medicine in rural 
Kenya. 


Prediction of 
success of graduate 
students in an 
American business 
school. 


Prediction of 
graduate student 
success in an 
American business 
school. 


Knowledge of the 
uses of traditional 
remedies. 


Skills in solving a 
practical problem. 


Situational 
judgment test. 


The examinees (Kenyan Sternberg et al., 

children] are given a 2001. 

description of a person's 

symptoms, the Kenyan name 

of the illness, and asked which 

of five native herbal medicines 

are appropriate for the case. 


The examinee is given the role Hedlund et al., 

of the human resources 2005. 

manager of a manufacturing 

plant facing a personnel 

shortage. Employees are 

working excessive amounts of 

overtime and morale is low. 

The examinee is to suggest a 
course of action. Materials 
provided: current employment 
figures and job-satisfaction 
survey results. 

The examinee is presented Hedlund et al., 

with a personnel problem 2005. 

similar to the one just 

described, but with 

considerably more detail 

given. The examinee is then 

asked to rate several possible 

solutions, including hiring 

temporary workers, hiring 

full-time employees but 

warning them that they may 

be let go if product demand 

drops, mixing the two 

solutions, or letting each 

division within the plant 

decide upon its own solution. 


than declarative knowledge, knowing how 
to describe the situation. This is a notion 
worth examining. 

The environment offers different degrees 
of support for acquiring knowledge, rang¬ 
ing from presenting examples offered with¬ 
out comment to presenting ideas in formal 
instruction, either in or out of school. Peo¬ 
ple will vary in the extent to which they 
can articulate how or why they take cer¬ 
tain actions. For instance, most people learn 


sports like bicycling and skiing by being 
coached. However, relatively few skilled 
skiers and bicyclers become good coaches, 
in part because they cannot verbalize what 
the novice is supposed to do. The concepts 
of tacit knowledge and explicitly instructed 
knowledge represent end points on a contin¬ 
uum, rather than mutually exclusive classes. 
In the narrowest sense, tacit knowledge 
should be knowledge that people acquire 
by observation and experience, without any 
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form of instruction. As an amusing exam¬ 
ple, think of the way that pre-adolescents 
learn to curse. The learning phase usually 
includes some hilarious misuses of improper 
language. 

At times Sternberg and his colleagues 
write as if tacit knowledge were identical to 
practical knowledge. This blurs an impor¬ 
tant distinction. Much practical knowledge 
is acquired through formal instruction. For 
example, airline captains have a great deal of 
job-relevant knowledge about how to fly air¬ 
craft. Much of it is acquired in flight school. 
As late as the 1960s Polynesian navigators in 
the Marianas Islands went to formal schools 
to learn how to sail outrigger canoes across 
vast reaches of the Pacific. 29 Simply showing 
that knowledge is practical does not ensure 
that it has been acquired tacitly. 

This restriction does not deny the impor¬ 
tance of tacit knowledge. Most apprentice¬ 
ship learning is tacit. The master and the stu¬ 
dent have a problem to solve, and the master 
solves it, with the assistance of the student, 
who, hopefully, will later be able to act on 
his or her own. During problem solving the 
master may or may not explicitly instruct 
the student, but what instruction there is 
is almost always in the context of practical 
problem solving. 

The research issue is whether differ¬ 
ent cognitive skills are required for learn¬ 
ing by observation, apprenticeship learning, 
and formal classroom learning. It could also 
be that general intelligence is a substantial 
requirement for all forms of learning. The 
question is one to be settled by investiga¬ 
tion, not by arguments that one or the other 
conclusion must be true. 

5.2.3. The Evidence for the Theory 
of Successful Intelligence 

Validation of Sternberg’s theory requires 
that tests be defined for each of the three 
abilities, that the abilities be substantially 
uncorrelated (for otherwise they might 
simply be reflections of the pervasiveness 
of general intelligence), and that the 

29 Hutchins, 1983. 


uncorrelated portions of the tests of aca¬ 
demic, creative, and practical intelligence 
predict important behaviors both within and 
outside of an academic environment. In the 
book Practical Intelligence in Everyday Life, 
and in several publications since then, Stern¬ 
berg has claimed that these criteria have 
been met. 30 This claim has been disputed. 31 
It would not be possible to go through point 
and counterpoint of every study. I have 
selected a few examples to highlight the 
issues involved. 

Three studies are relevant to the predic¬ 
tion of academic knowledge in three differ¬ 
ent educational settings. 

In one study Sternberg and his colleagues 
administered tests of analytic, creative, and 
practical intelligence to over three hundred 
students attending a pre-college preparatory 
course at Yale University. At the time the 
students were also taking an introductory 
psychology course. This made it possible 
to construct a three-by-three experimental 
design, in which students were categorized 
as having strengths in analytic, creative, 
or practical intelligence, and then were 
assigned to study sections where the instruc¬ 
tion stressed analytic, creative, or practical 
problem solving. Sternberg claimed that stu¬ 
dents performed best when the method of 
instruction matched their strengths in the 
appropriate type of intelligence. 32 

Nathaniel Brody, the author of several 
textbooks and a number of scholarly com¬ 
mentaries on intelligence, made several crit¬ 
icisms of this study. 33 He noted that the 
sample had been pre-selected to be high 
on general intelligence, for they were, after 
all, students intending to enter an extremely 
selective university. The factorial experi¬ 
ment supporting the crucial aptitude x treat¬ 
ment interaction dealt with only 199 of the 
324 students in the study. When the entire 

30 Sternberg et al., 2000. See also his reply to criti¬ 
cisms (Sternberg, 2003b) and a claim for the utility 
of his measures in the selection of college students 
(Sternberg, 2006, 2007) and in general education 
(Sternberg, Grigorenko, & Zhang, 2008). 

31 Gottfredson, 2003a; Hunt, 2008. 

32 Sternberg et al., 1996, 1999; Sternberg, Grigorenko, 
& Zhang, 2008. 

33 Brody, 2003. 
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sample was considered there was no evi¬ 
dence of an aptitude by treatment interac¬ 
tion. Finally, Brody reanalyzed Sternberg’s 
data and concluded that the predictive abil¬ 
ity of the three allegedly independent tests 
of different types of intelligence was due 
solely to a common trait that underlay them 
all. Brody felt that this represented general 
intelligence. 

Sternberg replied to Brody by saying that 
he regarded the Yale study as a preliminary 
one, that he and his colleagues had devel¬ 
oped better tests of creative and practical 
intelligence, and that positive results would 
shortly be forthcoming. 34 Sternberg subse¬ 
quently published the results of a very large- 
scale study, the RAINBOW study, involv¬ 
ing a number of universities and colleges, 
in which an attempt was made to predict 
first-year grade point averages (GPAJ from 
a combination of SAT scores and scores 
on tests of analytic, creative, and practical 
intelligence. 35 The RAINBOW study cov¬ 
ered a diverse set of institutions, ranging 
from California community colleges to Ivy 
League universities. One of the tests of cre¬ 
ativity did improve predictivity beyond the 
predictivity achieved using the SAT alone, 
but the tests of practical ability did not add 
anything beyond predictions using the SAT 
alone. 

In a nontechnical note addressed to col¬ 
lege administrators, Sternberg claimed that 
predictivity was increased by 15%, in terms 
of variance accounted for. This is a sub¬ 
stantial gain indeed. 36 Since the SAT pre¬ 
dicts 10-15% of the variance in performance 
of accepted students, if we accept Stern¬ 
berg’s statement at face value, a test pro¬ 
gram that incorporated Sternberg’s creativ¬ 
ity tests along with the SAT or ACT could 
predict somewhere between 15% and 25% of 
the variance in first-year GPA. 

Given this claim, a close look at the data 
supporting it is in order. 

The added predictivity was due to a sin¬ 
gle test of creative ability, in which exami- 

34 Sternberg, 2003b. 

35 Sternberg, 2006. 

36 Sternberg, 2007. 


nees saw a collection of pictures, then made 
up and told the examiner a story about the 
pictures. Thus the test was rather like view¬ 
ing comic book pictures without the cap¬ 
tions, and then telling the story. Surpris¬ 
ingly, there was no gain in predictivity if the 
examinees had to write the story. Asserting 
that "Creative Intelligence” added to predic¬ 
tivity is something of an overgeneralization, 
since one test worked and the other did not. 
More generally, it is difficult to interpret the 
RAINBOW results because of a statistical 
problem. 

An attempt was made to establish a rela¬ 
tionship using multiple tests of both prac¬ 
tical and creative intelligence, and one of 
them worked. This is a reasonable model for 
exploratory research, but it opens the gates 
to capitalization on chance results. In such 
cases the only thing to do is to attempt to 
replicate the findings. 

Because a result could have been obtained 
by chance does not mean that it was. The 
result is consistent with previous research on 
creativity, especially Torrance's work, cited 
earlier. According to Sternberg the RAIN¬ 
BOW project is to be expanded. 37 If the 
results do replicate, a good case will have 
been made for including tests of the ability 
to produce fictional material as an evalua¬ 
tor of college readiness. 38 This is an impor¬ 
tant theoretical finding. Whether or not this 
means that Sternberg has provided a theo¬ 
retical case for Creative Intelligence, apart 
from the sorts of testing of linguistic com¬ 
petence that occur in conventional testing, 
is a somewhat different question. 

In order to make a case for creativity 
as a predictor Sternberg and his colleagues 
will have to show that several creativity 
measures predict success in college, beyond 
the prediction that can be obtained using 

37 ibid. 

38 The most recent versions of the SAT include a test 
of writing ability. If Sternberg and colleagues’ results 
depend upon verbal fluency, they should not add to 
predictivity beyond that obtained by the expanded 
SAT. If the key point is creativity, however, then 
their tests should be unique predictors, because the 
SAT writing requirement is, in terms of content, 
rather prosaic. Grading is for English usage rather 
than for literary merit. 
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standard tests, such as the SAT and ACT. 
Ideally there should be some nonverbal cre¬ 
ativity measures, to avoid confounding cre¬ 
ativity and verbal performance. 

The RAINBOW study failed to sup¬ 
port Sternberg’s theorizing in one important 
way. None of the measures of practical intel¬ 
ligence in an academic setting added to the 
amount of prediction that could be obtained 
with the SAT alone. This finding is consis¬ 
tent with Gottfredson’s claim 39 that tests of 
practical intelligence are actually evaluating 
general intelligence (g) rather than evalu¬ 
ating a separate dimension of ability, as is 
claimed in Sternberg’s Theory of Successful 
Intelligence. 

Recognizing this and similar concerns, 
Sternberg and his colleagues have acknowl¬ 
edged that the Theory of Successful Intel¬ 
ligence provides one among several possible 
ways of motivating the empirical studies and 
organizing their results with respect to aca¬ 
demic accomplishment. 40 

The RAINBOW study attempted to do 
many things, using a complicated design. 
It is useful to look at a more focused 
study, in which Sternberg and his colleagues 
attempted to predict the performance of 
students in their first year of a Master of 
Business Administration (MBA) program at 
the University of Michigan. 41 

Students in the Michigan program took 
two tests of practical intelligence - a test 
of their ability to analyze and recommend 
actions on realistic business problems, and 
a situational judgment test similar to the 
one described in Table 5.1. An attempt 
was made to predict their first-year course 
grades, either using the two tests and the 
students’ scores on the Graduate Manage¬ 
ment Admissions Test (GMAT), which they 
had taken prior to entry, or using only 
the GMAT. The tests improved prediction 
over the predictions based on the GMAT 
alone. This finding is clearly what would 
be predicted by the Theory of Successful 
Intelligence. 

39 Gottfredson, 2003a,b. 

40 Sternberg, 2006, p. 322. 

41 Hedlund et al., 2006. 


Sternberg and his colleagues have com¬ 
pleted a number of studies outside of aca¬ 
demic settings. These include both stud¬ 
ies of industrial performance and studies 
of intelligence in developing countries. A 
study of military leadership is of particular 
interest, both for its own importance and 
because it is illustrative of potential indus¬ 
trial applications. 42 

The first part of the study consisted of an 
extensive review of military leadership skills, 
including examinations of leadership duties 
at three different military levels: platoon 
leaders, company commanders, and battal¬ 
ion commanders. The distinction is impor¬ 
tant because the type of leadership required 
differs across these levels. 

Platoon leaders lead by direct, face-to- 
face contact with from twenty-five to forty 
soldiers. Platoon leaders are almost invari¬ 
ably lieutenants, recently commissioned 
officers. They are assisted by three to five 
sergeants, noncommissioned officers who 
have had less formal command training than 
the lieutenants, but may have had consider¬ 
ably more experience in the military. 

Company commanders are usually cap¬ 
tains or senior first lieutenants. In the Amer¬ 
ican services they will have had roughly 
five to ten years’ experience in the military. 
Depending on the type of company, they 
will command from one to two hundred sol¬ 
diers, organized into three or four platoons. 
The company commander has a small staff, 
including an executive officer (usually a lieu¬ 
tenant) and a few senior sergeants. His job 
mixes administrative duties with a good deal 
of face-to-face leadership of platoon leaders 
and sergeants. 

Battalion commanders are majors or, 
more likely, lieutenant colonels. They will 
have had at least ten years’ experience as 
officers. Depending upon the type of bat¬ 
talion, they will be in command of any¬ 
where from three hundred to six hun¬ 
dred soldiers, organized into companies or 
analogous units. Battalion commanders are 
essentially administrative leaders. They are 
assisted by staffs containing experienced 

42 Hedlund et al., 2003. 
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officers and senior sergeants, who develop 
plans and relay orders to company com¬ 
manders and, rarely, individual platoon 
leaders. 

Sternberg and his colleagues developed 
three measures of what they describe as tacit 
knowledge of military leadership for each 
level of command. These covered a num¬ 
ber of practical aspects of leadership, rang¬ 
ing from motivating subordinates to com¬ 
municating with senior officers. In addition 
they utilized a standard measure of verbal 
reasoning (similarities and analogies), and a 
measure of tacit knowledge developed for 
use with civilian managers. The officers were 
rated for their leadership ability by their 
superiors, peers (except for battalion com¬ 
manders), and subordinates (except for pla¬ 
toon leaders). 

The question of interest is whether the 
measures of military tacit knowledge pre¬ 
dicted leadership ratings, over and above the 
prediction possible from use of the verbal 
reasoning measure. A secondary question is 
whether the measures designed for the mil¬ 
itary are more accurate predictors of lead¬ 
ership ratings than the measure derived for 
use in civilian settings. According to the the¬ 
ory, the military measures should be more 
accurate because they evaluate knowledge 
required by the local situation. 

The results presented a mixed bag. The 
military knowledge tests were the best pre¬ 
dictors of leadership ratings, but no test 
did particularly well. The six possible cor¬ 
relations between the military knowledge 
questionnaires and rated leadership ranged 
from .46 to —.11. What is even more puz¬ 
zling is the pattern of correlations. The .46 
represented a positive relation between the 
military knowledge test and the effective¬ 
ness ratings of battalion commanders by 
their superior officers. The —.11 represents 
a relationship between the same test and 
the ratings the same battalion commanders 
received from their subordinates! Similar 
but not quite so puzzling anomalies were 
found in the ratings of platoon leaders and 
company commanders. A further statistical 
analysis showed that knowledge of military 
leadership did have some predictive validity 


beyond the other measures. However, the 
effects were far from spectacular. Only three 
of the seven possible comparisons were sta¬ 
tistically reliable, and only one, prediction 
of superiors’ ratings of battalion comman¬ 
ders, was large enough to be of practical 
importance. 

As in the case of the studies of academic 
performance, this is the sort of study that is 
suggestive, but replication is required before 
strong conclusions can be made. 

The studies in non-Western cultures 
include studies of a pastoral/farming com¬ 
munity in rural Kenya, health practices in a 
Russian city, and hunters in an Alaskan Inuit 
community. 43 

The Alaskan study 44 provides typical 
results. The participants were Yu’pik Inuit 
adolescents residing in several small towns. 
The people in this area are best described as 
semitraditional; they live in settled towns, 
go to Alaskan state schools, and participate 
in the normal American economy. They 
also do a considerable amount of hunting 
and, to some degree, follow traditional social 
customs. Scores on a test of knowledge of 
hunting lore and Inuit terms were largely 
independent of scores on conventional tests 
of intelligence. The scores on the tests of 
Inuit knowledge correlated with estimates 
of hunting skills. Similar results were found 
in a study in Kenya, where the targeted 
information was knowledge of traditional 
medicine. 

Both these studies illustrate a simple prin¬ 
ciple. People will learn to solve the cognitive 
challenges their societies present. There will 
be individual differences in the amount of 
learning, and such differences will predict 
performance in that society. 

5 . 2 . 4 . Evaluation and Critique 
of the Theory 

Sternberg’s approach can be evaluated on 
three grounds; his criticisms of other the¬ 
ories, the logic of his own theory, and the 
evidence he offers for it. 

43 Sternberg, 2004. 

44 Grigorenko et al., 2004. 
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CRITIQUE OF THE ATTACK 
Sternberg’s attacks on other theoreti¬ 
cal approaches are excessive, especially 
when he writes for nontechnical audiences. 
“Everything people have been told about 
intelligence” is not false, and intelligence 
tests do not solely evaluate “narrow abilities 
of use only in school settings.” Fie makes 
a case against conventional tests by offer¬ 
ing anecdotes about failures in prediction, 
including some of his personal experiences. 
This is an effective rhetorical device, but it 
is not a valid argument, unless he is argu¬ 
ing against a claim that the tests are per¬ 
fect predictors of performance. No one has 
ever made such a claim. The claim that is 
made, and supported by massive evidence, is 
that the tests, although far from perfect pre¬ 
dictors of performance, are the best predic¬ 
tors that we have, both in academic and 
industrial settings. 45 

Nonetheless, Sternberg has a serious 
point. Since Binet’s time psychometricians 
have seen their task as one of measur¬ 
ing intelligence, and then simply reporting 
the result. There has been a great deal of 
research on increasing test validity and mak¬ 
ing test administration more efficient. What 
has been missing is concern for what hap¬ 
pens after test scores have been obtained. 
There is an implicit (and often not-so- 
implicit) assumption that personnel decision 
makers will establish a cut score, and then 
either accept or reject candidates. Reality is 
more complex. Educators have to do some¬ 
thing with the students they have; industrial 
supervisors have to design jobs and train¬ 
ing programs suitable for the available work¬ 
force; and military commanders have to lead 
the troops who have enlisted. What Stern¬ 
berg has done is to move away from a pure 
selection model for the use of test scores to 
a model where test scores are meshed with 
training and educational decisions. 

This is not a novel idea, for industrial- 
organizational psychologists have been 
doing this for some time, especially in 

45 For reviews, see Gottfredson (1997], Hunt (1995), 

and Schmidt & Hunter (1998). The topic is explored 

in Chapter 10. 


military settings. For instance, in addition 
to the AFQT, the ASVAB battery provides 
scales indicating skills or knowledge in spe¬ 
cial fields (e.g., electronics). These scales 
are then used, along with other criteria, to 
assign soldiers to training programs for var¬ 
ious military specialties. In educational and 
most civilian industrial applications, though, 
it is much more common for test scores to 
be used to determine whether an applicant 
is to be selected or not, without using test 
scores to tailor experiences for the individ¬ 
ual after he or she has been selected. As 
was illustrated by the Yale study, Sternberg 
has called educators’ attention to the pos¬ 
sibility that training after selection could be 
guided by test scores indicating the student's 
strengths and weaknesses. This is an ambi¬ 
tious, needed step forward that has been 
totally ignored by most psychometrically 
oriented researchers in the field. 

CRITIQUE OF THE THEORY 
To what extent have Sternberg and his 
colleagues offered new theoretical insights 
about intelligence? In order to answer this 
question it is important to make a distinc¬ 
tion between pragmatic advances in pre¬ 
dicting important behaviors and theoretical 
advances in the study of intelligence. Both 
are valuable, but they are not the same. It 
is possible to make an important pragmatic 
advance in personnel selection, by calling 
attention to the importance of some aspect 
of intelligence already covered by other the¬ 
ories, but not being used for selection, with¬ 
out making any advance in the study of 
intelligence. 

In my opinion, Sternberg’s work on cre¬ 
ative intelligence represents a pragmatic 
advance. Sternberg was not the first to dis¬ 
tinguish between creativity and intelligence, 
nor was he the first to develop creativity 
tests. Fie has made an important contribu¬ 
tion by calling attention to something that 
was in the literature, but had “dropped off 
the radar” of current test designers. 

Sternberg's notion of practical intelli¬ 
gence is, at the theoretical level, close 
to Cattell and Fiorn’s view of crystallized 
intelligence (Gc). Both tests of practical 
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intelligence and tests of Gc stress the use 
of previously acquired knowledge to solve 
a current problem. Where Sternberg and 
his colleagues differ from conventional test 
developers is in how they have instantiated 
the concept to create tests. 

In conventional psychometric batteries 
tests for Gc are designed to evaluate peo¬ 
ple's ability to use the knowledge and ideas 
that are needed in industrial/post-industrial 
societies, and that are often explicitly taught 
in our schools. They are therefore cultur¬ 
ally limited, albeit to a very large, impor¬ 
tant culture. They are further limited by 
concentrating on the evaluation of a sort 
of common denominator of what people 
in modern cultures are supposed to know. 
This is reasonable if the test is to be 
used for some broad swath of the popu¬ 
lation in an industrial/post-industrial soci¬ 
ety, such as applicants to college. Sternberg 
and his colleagues have demonstrated the 
feasibility of developing tests of Gc, their 
"practical intelligence," for non-Western, 
nonindustrial societies and for specialized 
segments of the sprawling post-industrial 
society. Cattell said that this would have to 
be done over half a century ago. 46 Stern¬ 
berg did it. He was not the first to do so. 
Industrial-organizational psychologists have 
been developing specialized tests for years. 
Situational judgment testing is a good exam¬ 
ple. Sternberg and his colleagues have done 
an effective job of showing the need for such 
specialized testing in other areas. 

When Sternberg and his colleagues ques¬ 
tion the utility of a test of, say, knowl¬ 
edge of English vocabulary to predict perfor¬ 
mance in rural Kenya, they are not attacking 
the concept of Gc. They are attacking the 
overgeneralization of the conventional real¬ 
ization of that concept, beyond the context 
in which the test was developed. 

All in all, Sternberg’s emphasis on prac¬ 
tical intelligence, if heeded, will represent 
an important, albeit not unique, pragmatic 
advance in testing. 

The concept of tacit intelligence is a 
unique theoretical contribution. A great 

46 Cattell, 1957, 1971. 


deal of mental competence is based on infor¬ 
mation and skills that we acquire without 
explicit instruction. This is particularly true 
of the acquisition of language and social cus¬ 
toms. Billions of people - literally - speak 
their native language and get along in their 
society without explicit knowledge of either 
the rules of syntax or the rules of etiquette. 

Although Sternberg and his colleagues do 
not stress this, there are both input and out¬ 
put issues in the use of tacit intelligence. 
On the input side, are some people more 
able than others to pick up the unspoken 
rules? Is this a general talent, or is it specific 
to particular social situations? On the out¬ 
put side, recent research in decision making 
has shown that in some situations people 
are well advised to accept their gut reac¬ 
tion, rather than making decisions based on 
conscious reflection. In other situations con¬ 
scious reflection is important. 47 Are some 
people better than others at deciding when 
to act with or without reflection? These are 
interesting research questions. Although the 
general field of unconscious cognition has 
blossomed since the beginning of the cen¬ 
tury, there has been little study of individ¬ 
ual differences in either implicit learning or 
unconscious decision making. 

There is no need to comment on Stern¬ 
berg’s analytic intelligence component, for, 
as he says, this is equivalent to standard 
views of general reasoning. 

Sternberg has also extended his interest 
to ask how intelligence is used. He distin¬ 
guishes three intellectual styles. By analogy 
to government, he identifies an executive 
style, which is an interest in putting ideas 
into place; a legislative style, which is an 
interest in creating ideas in the first place; 
and a judicial style, which is an interest 
in analyzing the implications of ideas and 
efforts to put them into placed 8 The exten¬ 
sion stresses an interaction between the “can 
do" concept of intelligence and “how to" 
personality and interest issues. This work 
is in its beginning. The approach is an 
interesting one, for the borderline between 

47 Klein, 2009. 

48 Sternberg, Grigorenko, & Zhang, 2008. 
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intelligence and personality research is 
underdeveloped. 

EVALUATION OF THE EVIDENCE 
FOR THE THEORY 

In Practical Intelligence (and in several arti¬ 
cles since that book was published) Stern¬ 
berg and his colleagues claim that there is 
ample evidence for the three different types 
of intelligence, and that practical intelli¬ 
gence, in particular, is an extremely impor¬ 
tant aspect of mental competence. I have 
already expressed some reservations. Other 
reviewers have expressed their reservations 
with more vehemence. 

Linda Gottfredson, a professor of educa¬ 
tion at the University of Delaware who has 
frequently and ably defended classic psycho¬ 
metric theory, published a detailed analysis 
of the evidence that Sternberg and his col¬ 
leagues provided in Practical Intelligence , 49 
She concluded, 

/The authors of Practical Intelligence] 
exaggerate the strength of the empirical sup¬ 
port they summarize. They do so by pre¬ 
senting the most favorable results, over¬ 
stating even those, interpreting inconsistent 
data in ways that produce consistent sup¬ 
port, and giving citations to back up strong 
statements but which do not actually pro¬ 
vide independent support (many are just 
earlier summaries of the same thing) or 
that even contradict the claim in question. 

The authors simultaneously discourage 
the close analysis that would reveal the 
inadequacies of their data and presen¬ 
tation. They do so partly by appealing 
to many people's strong desire to believe 
them, specifically by tapping the popu¬ 
lar preference for an egalitarian plural¬ 
ity of intelligences (everyone can be smart 
in some way) and a distaste for being 
assessed, labeled, and sorted by inscrutable 
mental tests. These sentiments are evoked 
again by casting aspersions on research and 
researchers that have helped reinstate the 
concept of g, or general intelligence. 

Gottfredson, 2003a, p. 392; 
emphasis in the original 


Sternberg replied to Gottfredson but, as 
she pointed out, did not address the spe¬ 
cific criticisms she made of his work. 50 Stern¬ 
berg’s reply did make a point that Gottfred¬ 
son failed to address. He argued that focus¬ 
ing on individual studies can cause one to 
fail to appreciate the broad range of sup¬ 
port for the theory. There is validity to this 
argument. 

My own evaluation is this. 

Sternberg’s critiques of previous work 
are too broad, especially in articles and 
books written for nonspecialists. Neverthe¬ 
less, they have a grain of truth in them, 
particularly in his criticisms of the work 
of researchers who have, either implicitly 
or explicitly, accepted the idea that intel¬ 
ligence is what the intelligence tests test. 
The theoretical ideas are not as new as some 
of his writing would lead one to believe, 
but they do advance the field. I point espe¬ 
cially to his expansion of the concept of Gc 
by the development of practical intelligence 
tests, which could just as easily be described 
as tests of the development of Gc in spe¬ 
cialized contexts. His ideas about possible 
individual differences in the acquisition of 
implicit (tacit) and explicit knowledge are 
very interesting, and deserve further explo¬ 
ration. He is also to be applauded for attend¬ 
ing to the important issue of meshing assess¬ 
ment, training, and educational methods. 

In order to support his theory he and 
his colleagues have carried out a staggering 
number of experiments. Taken individually, 
many of these studies are fairly weak. Taken 
together, they indicate that the augmenta¬ 
tion of conventional tests would improve 
prediction. While the improvements would 
not be large, they would be of practical sig¬ 
nificance for very large personnel selection 
programs, such as admission to college or 
selection of military officers. 

5.3. Ackerman's PPIK Theory 

Philip Ackerman, currently a professor at 
the Georgia Institute of Technology, has 


49 Gottfredson, 2003a. 


50 Sternberg, 2003; Gottfredson, 2003b. 
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developed a theory of adult intelligence 
that combines cognitive processes, person¬ 
ality, interests, and knowledge, hence the 
PPIK model. In contrast to Gardner and 
Sternberg, Ackerman has been careful to 
point out that his ideas are extensions of 
ideas proposed by earlier researchers. These 
include R. B. Cattelfs Gf:Gc distinction and 
J. L. Holland’s discussion of adult vocational 
interests. Ackerman’s extensions are a sub¬ 
stantial and innovative contribution to our 
understanding of intelligence. 

Ackerman was motivated by a paradox¬ 
ical finding in the literature. Many studies 
have shown that psychometrically defined 
general intelligence (g) and, even more 
telling, fluid intelligence (Gf) decline over 
the adult years, beginning at about twenty- 
five. 51 Yet the world seems to be run by 
adults over the age of thirty-five. Acker¬ 
man offers the following example. When 
the president of Russia, Boris Yeltsin, had a 
heart attack, one of the attending surgeons, 
Michael De Bakey, was in his eighties. Age 
was not the point; Dr. De Bakey had per¬ 
formed several thousand cardiac operations. 

This is not an isolated example. As we 
entered the twenty-first century more than 
half the physicians in the United States 
were over forty-five. 52 Most presidents of the 
United States took office when they were 
in their fifties. This group includes Wash¬ 
ington (fifty-seven), Lincoln (fifty-two), and 
Franklin Roosevelt (fifty-one), arguably the 
three greatest presidents. As of 2007, 90 of 
the 100 United States senators were over 50, 
and 25 were over 70. Perhaps the intelligence 
tests are missing something. 

Following Cattell, Ackerman 55 argues 
that as people age they increasingly rely on 
past knowledge to solve problems, rather 
than solving problems by using general 
problem-solving techniques. This is cer¬ 
tainly true for extreme cases. Studies of 
expert performance in fields ranging from 
chess to athletics have repeatedly shown 

51 This topic is discussed further in Chapter 11. 

52 Commission on Graduate Education in Medicine, 

sixteenth annual report, 2005. 

53 Ackerman, 1996; Ackerman & Heggested, 1997. 
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that very high levels of performance are 
reached after a great deal of deliberate prac¬ 
tice, resulting in well-organized knowledge 
in the domain of expertise. 54 The same prin¬ 
ciples apply, to a lesser degree, to all adults. 
Older people do worse than younger ones 
on conventional intelligence tests, especially 
those that stress fluid intelligence, but they 
do better than younger ones when asked 
to generate solutions to realistic problems, 
ranging from an adolescent’s desire to leave 
home to problems in credit management 
and personal health. 55 

Ackerman was also impressed by the 
extent to which adult knowledge is spe¬ 
cialized. Physicians, business entrepreneurs, 
salespeople, mechanics, and farmers have 
to meet the challenges of their particular 
social niche. The situation is somewhat anal¬ 
ogous, although not so extreme, to the con¬ 
ditions that led to Sternberg’s emphasis on 
culture-specific knowledge. Ackerman dis¬ 
tinguishes between general cultural knowl¬ 
edge (Gc) and knowledge of the more spe¬ 
cialized culture in which the individual lives 
(Gk). Moreover, specialization is not lim¬ 
ited to professional and technical knowl¬ 
edge. People in our post-industrial society 
will share experiences and interests with 
others in groups that are broadly defined, 
coherent within themselves, but different 
from each other. Experienced university 
professors, military officers, businessmen, 
and physicians will all share some cultural 
knowledge (Gc) of modern society, but they 
will differ in the particular knowledge that 
they have acquired within their domains 
of interest (Gk). These differences have to 
be considered when evaluating their intelli¬ 
gence. 

5 . 3 . 1 . PPIK Theory 

Ackerman proposes that evaluations of adult 
effectiveness consider four traits. The first 
two of these are what we might normally 
consider aspects of intelligence. Intelligence 
as process refers to the ability to deal with 

54 Ericsson, 1996. 

55 Baltes & Smith, 1990; Baltes & Staudinger, 2000. 
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novel, arbitrary, abstract problems. This 
aspect of intelligence corresponds closely 
to the Cattell-Horn view of fluid intelli¬ 
gence (Gf), and to the mental dimension of 
general reasoning (g) advocated by Jensen, 
Gottfredson, and others. Intelligence as 
knowledge is divided into two components: 
general cultural knowledge (the Cattell- 
Horn Gc, as conventionally measured) and 
knowledge relevant to the individual’s inter¬ 
ests and general role in society (Gk), includ¬ 
ing but not limited to the person’s economic 
role. 

Ackerman stresses the development of 
intelligence from the later childhood years 
through adolescence and on to maturity. His 
view is that over these periods intelligence is 
developed by a series of investments of cur¬ 
rent cognitive competence into knowledge¬ 
building experiences that both increase and 
temper the nature of future cognitive com¬ 
petencies. A person’s personality and inter¬ 
ests determine what experiences he or she 
will have, and therefore will guide the devel¬ 
opmental process. 

During childhood and adolescence com¬ 
petence is largely guided by the process 
aspect, as the child acquires general cul¬ 
tural knowledge through interaction with 
a relatively uniform environment, at least 
in the range typical of middle-class soci¬ 
ety in the developed countries. The situa¬ 
tion is gloomier for children in extremely 
poor home or school environments, which 
simply do not offer much opportunity to 
acquire the early stages of Gc (e.g., skill in 
reading). In the later school years knowl¬ 
edge begins to build on knowledge, so that 
Gc acquired during, for instance, the first six 
years of school combines with maturing pro¬ 
cessing capacities (Gf) to guide acquisition 
of cultural knowledge in the next six years. 
This accounts for two observed phenomena, 
the increasing stability of test scores as chil¬ 
dren progress through school and the high 
validity of IQ scores obtained in early mid¬ 
dle school as predictors of accomplishment 
by the end of high school. 

Opportunities for specialized experi¬ 
ences arise in the later high school years, 
and continue through young adulthood, in 


the choice of jobs and/or college majors. By 
the early to mid twenties there will be a shift 
from acquisition of general knowledge of the 
culture to acquisition of knowledge relevant 
to specific sectors of society. Throughout 
both the school and early adult years intel¬ 
ligence plays a crucial role, for this is what 
makes the difference between having expe¬ 
riences and acquiring knowledge. Table 5.2 
makes this point, by showing the correla¬ 
tions between tests of intelligence and tests 
of domain-specific knowledge. Ackerman 
and Beier have also demonstrated this point 
experimentally. Adults were first tested for 
their knowledge of the management of per¬ 
sonal finances, clearly an important topic 
to us all. They were then given instruction, 
and tested again. Both pre-test and post-test 
performance and improvements in perfor¬ 
mance were predicted by tests of Gf and Gc. 
Perhaps the smart do not just get smarter 
over time; they may also get richer. 

While environmental constraints un¬ 
doubtedly play a part (a variable that 
Ackerman does not discuss), in a noncaste 
society personal interests and personality 
will play a substantial role in the person’s 
choice of social niche, and therefore in 
the subsequent development of knowledge. 
This means that people who differ in how 
much they know about various domains will 
differ in other ways as well. 

At the same time, due to biological effects 
associated with aging, there will be a decline 
of the processes supporting intelligence, that 
is, a decline in Gf. However, this does not 
mean that older people are less intelligent. 
Intelligence begins to differentiate. There is 
a decline in the processes supporting reason¬ 
ing (e.g., concentration of attention, short¬ 
term memory, time to retrieve information) 
but an increase in the efficiency with which 
those processes are used, providing that the 
person is operating within a domain where 
his or her previously acquired knowledge is 
relevant. 

Ackerman 56 has identified five domains 
of knowledge that are at once broader than 
the knowledge associated with professional 

56 Ackerman & Heggested, 1997. 
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Table 5.2. Selected data on the relation between knowledge and intelligence test scores 
and age. All participants were college graduates twenty-one to sixty-five years old. 
Twenty-five percent had advanced degrees. Due to restriction of the range of the test 
scores the estimates of correlations with intelligence are probably underestimates of the 
correlations in the general population. 


Content of Test 

Correlation with Gf 

Correlation with Gc 

Correlation with Age 

Chemistry 

.516 

.385 

— .240 

Physics 

.568 

■5*5 

-.316 

Electronics 

•347 

•45 1 2 

.200 

Law 

.186 

•379 

.189 

Geography 

.320 

■525 

.150 

Art 

.168 

.468 

.224 

World Literature 

.156 

.600 

.248 


Source: From Ackerman, 2000, Table 4. By permission of Oxford University Press and the author. 


skills, yet narrower than the general cultural 
knowledge evaluated by a typical test for 
crystallized intelligence. The domains are 
knowledge about physical science, math¬ 
ematics, the arts, literature, and social 
science. 

Knowledge in each domain is not isolated; 
it is part of a trait complex that includes the 
intelligence required to work in a domain 
along with the personality and interests that 
lead someone to decide to work in the 
domain in the first place. To give these ideas 
some content, imagine a longitudinal study 
in which we followed students who, as they 
leave high school, vary in Gc and Gf, as con¬ 
ventionally measured, and also vary along 
the dimensions of interest in abstract investi¬ 
gations or social issues, and in conscientious¬ 
ness. Subsequently, when they are in their 
thirties and forties, we test this group for 
their knowledge of scientific and social top¬ 
ics. According to the PPIK model we would 
find the following relationships: 

1. Interest in abstract investigations, mea¬ 
sured upon graduation from high 
school, would be related to adult knowl¬ 
edge of science. Interest in social issues, 
once again measured as a high school 
student, would be positively related to 
knowledge of social situations. 

2. Knowledge of social situations and 
knowledge of science would be posi¬ 


tively related to general intelligence (g), 
measured as an amalgam of scores on 
tests of Gf and Gc. (We expect the two 
to be substantially but not perfectly cor¬ 
related.) However, the relative weight¬ 
ings given to Gf and Gc might dif¬ 
fer depending upon whether we were 
trying to predict knowledge of science 
(which is often relatively abstract) or 
knowledge of concrete social situations. 

3. With Gf and Gc and interests held 
constant, statistically, there should 
be a positive relationship between 
conscientiousness and knowledge. 

As far as I know, this hypothetical study 
has yet to be done. However, Ackerman and 
his colleagues have reported cross-sectional 
studies that closely resemble it. 57 These 
studies revealed four trait complexes, pro¬ 
duced by the correlations between domain 
knowledge, intelligence test scores, inter¬ 
ests, and personality traits. They are 

1. The science and math complex : This com¬ 
plex is defined by relatively high fluid 
intelligence (g or Gf) and by interest 
in realistic problems and in investiga¬ 
tion of abstract issues. People with high 
knowledge of science and mathematics 

57 Ackerman, 2000; Ackerman & Beier, 2006; Acker¬ 
man & Rolfhus, 1999. 
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issues tend to have high scores on this 
complex. 

2. The conventional complex : This complex 
is defined by interest in conventional 
topics, conservatism, and conscientious¬ 
ness. Abilities are widely distributed 
across this complex. 

3. The social complex : This complex is asso¬ 
ciated with an interest in social relations 
and extraversion. Abilities are widely 
distributed across this complex. Some 
studies indicate that the social com¬ 
plex should be split into two com¬ 
plexes, one identifying people inter¬ 
ested in social power and leadership, 
and the other identifying people who 
value close social relations. 

4. The intellectual/cultural complex: This 
complex is associated with knowledge 
of the arts and humanities. It is also 
characterized by the personality traits 
of openness to experience and a ten¬ 
dency to become absorbed in intellec¬ 
tual activities. The trait is also associ¬ 
ated with relatively high scores on tests 
of Gc. 

These descriptions are my gloss of results 
from Ackerman’s studies. Slightly different 
complexes, with slightly different descrip¬ 
tions, have been found in each study. This 
is not surprising, for the studies gener¬ 
ally involve from one to two hundred par¬ 
ticipants, recruited opportunistically (e.g., 
by newspaper advertisements) rather than 
by random sampling from a definable 
population. 

5.3.3. Evaluation of the PPIK Theory 

I believe that the PPIK approach has a great 
deal of potential. Ackerman and his col¬ 
leagues have demonstrated the need to con¬ 
sider domain knowledge as an attribute of 
adult intelligence and the importance of 
considering how personality and interests 
guide the investment of current intelligence 
to produce future knowledge. 

I suspect that there is more to Ackerman's 
concepts than the current studies have sug¬ 


gested. As of 2010, Ackerman's studies have 
concentrated predominantly on the study of 
college students and adults who have had 
at least some college. It seems likely that 
if studies of trait complexes were to be 
extended beyond the largely middle-class, 
college-educated urban population, more 
trait complexes would be uncovered. Their 
influence may be very powerful indeed. 

5.4. Personality Variables Related 
to the Development and Use 
of Intelligence 

The study of personality has developed 
almost independently of the study of intel¬ 
ligence. This is a bit surprising, given the 
amount of research on each topic. There is 
a difference in the sense that intelligence is 
a “can do” concept; it refers to the potential 
for dealing with and extracting useful infor¬ 
mation from the environment. Personality 
traits are generally “will do” concepts; they 
refer to a person's disposition to take certain 
actions and not others. Explanations of “can 
do” and “will do” complement each other. 
Psychology would benefit by an integrated 
approach to the two topics. 

5.4.1. Intellectual Engagement 

Ackerman's ideas on the topic provide a 
good place to begin our discussion. Acker¬ 
man has introduced the idea of intellectual 
engagement . He argues that people differ 
in the extent to which they engage con¬ 
ceptually with their experiences, especially 
when such engagement is not required for 
any immediate purpose. Imagine two stock 
brokers, both of whom carefully check the 
financial news, but only one of whom reads 
the rest of the newspaper. The second bro¬ 
ker has more intellectual engagement than 
the first. 

We can go beyond thought experiments. 
Goff and Ackerman developed a scale of 
intellectual engagement, based upon self- 
reports to questions about such things as 
manner of reading a newspaper. Scores on 
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this scale can contribute to the prediction 
of academic performance, over a three-year 
period, after the prediction due to differ¬ 
ences in intelligence is accounted for. 58 

This research is just in its beginning. The 
concept of intellectual engagement has a 
great deal of promise as an explanation of 
why some people learn broadly, while oth¬ 
ers become progressively narrower in their 
thoughts as they age. 

5 . 4 . 2 . Self-Discipline 

Intellectual engagement refers to the way 
people like to behave. A related approach is 
to look at attitudinal variables that deter¬ 
mine how much effort a person will put 
into doing something he or she should do. 
This brings us to the topic of self-discipline, 
which I interpret as the willingness to do 
something you perceive as worthwhile in 
the long run, even though it means for¬ 
going immediate pleasures. 

There is a substantial body of research 
showing that the personality trait of con¬ 
scientiousness, that is, showing up for work 
and keeping on task, is the best, and in 
many cases the only, predictor of workplace 
performance after the prediction associated 
with general intelligence has been accounted 
for. The message is clear: success in the 
workplace is associated with both intelli¬ 
gence and willingness to work. 59 

The contribution of conscientiousness, 
alone, as a predictor is much smaller than 
the contribution of intelligence. This may 
be because in many workforce situations 
behavior is sufficiently constrained that con¬ 
scientiousness is literally forced to fall into 
a narrow range. The time clock in manufac¬ 
turing is a prototypical example of such a 
situational constraint. 

The role of self-discipline seems to be 
especially important in the school years. 
Modern society educates its young using a 
form of instruction, formal schooling, that 

58 Goff & Ackerman, 1992; Chomorro-Premuzic, Furn- 

ham, & Ackerman, 2006. 

59 Hunt, 1995b; Schmidt & Hunter, 1998. 


depends upon the student’s accepting the 
idea of delayed gratification, for students are 
asked to trade present fun for rather nebu¬ 
lous future gains. How well students meet 
this challenge will determine, in part, where 
they stand on the Gc scale when they grad¬ 
uate. Two small, but important, studies in 
social psychology have shown how impor¬ 
tant the relationship between self-discipline 
and success is. 

The first study was conducted by a 
research group at Columbia University. 60 In 
the initial phase of their study pre-school 
children were offered a choice between two 
rewards that varied in value. The catch was 
that the children could either have the lesser 
reward immediately, or they could have the 
more valued one if they waited for a few 
minutes. (That is an eternity to a four-year- 
old.} Some fifteen years later the researchers 
obtained the SAT scores that the children 
had received upon their applying to college. 
There was a correlation of about .4 between 
the time the children had been able to wait 
for the desired reward, as toddlers, and their 
SAT scores as adolescents. 

Duckworth and Seligman 61 made the 
same point in a study of self-discipline in 
middle-schoolers. The children were mea¬ 
sured on academic achievement, intelli¬ 
gence, and a variety of self-discipline mea¬ 
sures at the beginning of the academic 
year. One measure struck me as particularly 
ingenious. Students were offered a choice 
between $1.00 during the testing session or 
a promise that the experimenter would give 
them $2.00 the following week. This task 
was accompanied by several other ques¬ 
tions in which students rated their choices 
in hypothetical situations where they could 
either take an immediate reward or wait 
for a delayed reward. The measures of self- 
discipline turned out to be better predic¬ 
tors for both grades and final test scores than 
measures of intelligence. The relative value 
is somewhat suspect, as the study partici¬ 
pants were students in a selective “magnet” 

60 Shoda, Mischel, & Peake, 1990. 

61 Duckworth & Seligman, 2005. 
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school, so there was probably a limited 
range of intelligence. However, the basic 
principle remains the same: self-discipline 
counts. 

These two studies were laboratory de¬ 
monstrations, using selected and probably 
rather talented participants. They need to be 
replicated. However, our confidence in their 
conclusion is strengthened because they 
complement many studies that have found 
positive relations between study habits, self- 
discipline, and academic performance. 62 

The moral to draw from this line of 
research is that you will not get smart if you 
do not try to get smart. But people will usu¬ 
ally not try something unless they think that 
they have a reasonable chance of success. 
This brings us to another attitudinal vari¬ 
able, mind-set. 

Carol Dweck and her associates have con¬ 
ducted a large number of studies on the 
relation between belief in one’s own effi¬ 
cacy and what people actually accomplish. 33 
Dweck’s argument, which goes well beyond 
the study of intelligence, is that to a consid¬ 
erable degree success depends upon having 
a positive attitude toward chances of success 
and the self-discipline to get up and act, even 
when success is not guaranteed. As Henry 
Ford said, “It does not matter whether you 
say you can or you can't. Either way you’re 
right.” 64 

Ford was a bit optimistic. In our soci¬ 
ety a person with a tested IQ in the 70s 
or 80s is not a good candidate for gradu¬ 
ate school in mathematics, no matter how 
hard he or she tries. Along with Dweck and 
Ackerman, I do believe that, for any fixed 
amount of intelligence, considerable varia¬ 
tion in accomplishment can be associated 
with individual differences in self-discipline, 
intellectual engagement, and in an attitude 
of being willing to challenge oneself. It is 
not necessary to set up a competition to 
see whether intelligence or personality mea¬ 
sures do the best job of predicting perfor- 

62 Crede & Kuncel, 2008. 

63 Dweck, 2006. 

64 Sign posted at the Henry Ford Museum, Dearborn, 

Michigan. 


mance. They are both important. Which one 
is more important will depend upon the par¬ 
ticulars of the situation. 

5 . 4 . 3 . Emotional Intelligence 

The study of intelligence has focused on 
individual differences in people’s ability to 
think, when they are thinking coolly and 
rationally. We do not always act that way. 
What about individual differences in our 
ability to deal with emotionally charged 
issues? This question brings us to the topic 
of emotional intelligence. 

The idea of emotional intelligence has 
generated a great deal of popular discus¬ 
sion, although it is not always clear just what 
is being discussed. Professional discussion 
has revolved around three points: the extent 
to which emotional intelligence ought to 
be regarded as an intelligence rather than 
a personality trait; the extent to which emo¬ 
tional intelligence is different from other, 
better-researched personality and intelli¬ 
gence traits; and the extent to which mea¬ 
sures of emotional intelligence are useful 
predictors of achievement in cognitive tasks, 
over and above conventional measures of 
intelligence. 

There is a good case for the existence 
of emotional intelligence. 65 Intelligence is 
defined as the ability to process informa¬ 
tion. Verbal intelligence refers to the ability 
to process verbal information, visual-spatial 
intelligence to the ability to process spa¬ 
tial and visual information, and so forth. 
There is a class of information that, loosely, 
can be referred to as emotional information. 
This includes internal cues about one’s own 
emotions and observations of the behavior 
of others, which provide clues about their 
emotions. Dealing with emotional informa¬ 
tion is as much a cognitive activity as dealing 
with verbal material. 

Before investigating emotional intelli¬ 
gence, we have to decide how it is to be mea¬ 
sured. By far the most common measure is 
a behavioral inventory, in which people are 

65 Mayer, Salovey, & Caruso, 2004; Salovey & Mayer, 

1990; Salovey & Grewal, 2005. 
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either asked to describe themselves or asked 
how they would act in a hypothetical situ¬ 
ation. A second, more expensive method of 
assessment is to determine the accuracy with 
which people can identify emotional dis¬ 
plays or perform some emotion-laden task. 
For example, an examinee might be asked to 
identify the emotion displayed by a person 
in a photograph or video display, or to list 
physiological signs known to be associated 
with different emotions. 

Self-descriptions do not fare too well as 
measurements, for they are contaminated by 
respondents' tendency to exaggerate their 
good qualities. This, in itself, is a confound¬ 
ing individual trait, for there are marked 
individual differences in the extent to which 
people bias their self-descriptions toward 
what they perceive as socially desirable ones. 
In the broad sense this is not surprising. 
Shakespeare said as much some years ago. 66 
Even if we disregard a tendency to self- 
glorification, there is no reason to believe 
that people are able to accurately report 
their strength in assessing and controlling 
emotions, relative to other people. Doing so 
would require considerable insight both into 
one’s own emotional intelligence and into 
the emotional intelligence of one’s acquain¬ 
tances. Self-reports of talents are not as 
objective as test scores. 67 

Behavioral measures of emotional intelli¬ 
gence are technically better measures than 
self-reports and have a firmer logical basis. 
Such measures of emotional intelligence are 
only modestly correlated with conventional 
measures of cognitive intelligence. This is 
evidence that we are indeed dealing with 
two separate abilities. 

Further evidence comes from the neuro¬ 
sciences. Following certain types of brain 
injury patients display greatly reduced 
ranges of emotion. In other clinical situa¬ 
tions the affected person will fail to dis¬ 
criminate emotions in others. Individuals 

66 Edwards [1959) discusses the issue of social desir¬ 
ability. As for Shakespeare, in King Henry V the 
king predicts that veterans of the battle of Agin- 
court would recount “with advantage” their heroic 
deeds. 

67 Brody, 2004. 


who have received injuries in those areas 
of the brain that control emotion may dis¬ 
play problems in interpersonal adjustment, 
but have no deficit in abstract cognitive 
performance. 68 

There is no contention that the brain 
mechanisms for dealing with “hot” and 
"cold” cognition are entirely separate. All 
that needs to be shown is that they are some¬ 
what different. We now know that the per¬ 
ception and activation of emotion are both 
associated with the limbic system, especially 
the amygdala, and the ventromedial pre¬ 
frontal cortex. 69 This contrasts with find¬ 
ings on the neural correlates of intelligence, 
which indicate heavy involvement of the lat¬ 
eral prefrontal cortex and the anterior pari¬ 
etal cortex in cognitive activities. 70 

The combination of the neuroscientific 
and correlational evidence makes a con¬ 
vincing case that there is such a thing as 
emotional intelligence, and that it is not 
the same as cognitive intelligence, although 
there is a small correlation between the 
two. A practical question remains. Does 
emotional intelligence predict academic and 
workplace performance as well, better, or in 
addition to measures of general (cognitive) 
intelligence? 

Here there seems to be a great deal of 
“sound and fury” but, at this time (2010), 
rather little reliable evidence. 

In his popular book on the topic 
Goleman 71 asserted that emotional intelli¬ 
gence is a much more important determi¬ 
nant of behavior than cognitive intelligence. 
He made his case largely by anecdote and 
informal reports. There are a number of 
studies showing that emotional intelligence, 
if measured alone, correlates with desir¬ 
able outcomes in life. The key question, 
though, is whether measures of emotional 
intelligence can predict important behav¬ 
iors, beyond those that can be predicted by 
conventional intelligence and personality mea¬ 
sures. Few such studies have been reported, 

68 Bar-On et al., 2003. 

69 Damasio, 1994, 1998. 

70 Jung & Haier, 2007. 

71 Goleman, 1995. 
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and those that have appeared have received 
sharp criticism. 72 

Despite the present record, though, I 
would not dismiss the potential contribu¬ 
tion that emotional intelligence may make 
to the study of intelligence. I suggest that 
these contributions will be greatest when 
we examine “will do” rather than “can do” 
performance, over a fairly long period of 
time, and when the performance being eval¬ 
uated requires interpersonal interaction, in 
face-to-face situations. A salesperson taking 
orders in a telephone call center for a ready¬ 
made clothing store does not have to have 
high emotional intelligence; a salesperson in 
an expensive clothing boutique does. 

5.5. Closing Comments on Models 
to Expand Beyond Psychometric 
Intelligence 

It is unlikely that dissatisfaction with con¬ 
ventional tests and testing will go away. 
While the critics have made some good 
points about the shortcomings of modern 
testing, they have not come up with good 
alternatives. If we are going to develop a sci¬ 
entific understanding of any topic, including 
human intelligence, there has to be some 
way of measuring what we are studying. Any 
claim that intelligence, as currently mea¬ 
sured, is not important simply flies in the 
face of the facts. However, progress depends 
upon going beyond the known. 

Consider another of Gardner's com¬ 
plaints about modern psychometric theory: 

Frankly I find it difficult to find a publicly - 
discussable reason why researchers should 
devote their energies to yet one more study 
of the role of *g' in arenas ranging from job 
performance to longevity or yet one more 
effort to document racial or ethnic differ¬ 
ences in intelligence. 

Gardner , 2006, p. 300 

This is a pugnaciously worded claim that 
psychometric research is spinning its wheels. 
While I would not be as strong as Gardner 

72 Brody, 2004; Matthews, Zeidner, & Roberts, 2005. 


in making such a complaint, a great deal of 
current work is highly repetitive. The field 
could use a paradigm shift. I do not see 
how this can come about without the devel¬ 
opment of new methods of measurement. 
Where are we in this endeavor? 

Instead of standing on the faces of those 
who went before, as psychologists are wont 
to do, let us stand on their shoulders. Any 
new theory of intelligence will have to deal 
with the facts established by present psy¬ 
chometric findings. General intelligence, g 
as measured by conventional testing, exists, 
and is a trait associated with accomplish¬ 
ment (or lack of it) in virtually all aspects of 
our society, not just in scholastic endeavors. 

There are two acceptable models for the 
psychometric data: the three-stratum model 
and the g-VPR model, both described in 
the previous chapter. New models should 
build on the insights contained in these 
two approaches. Which one a new theorist 
should choose to emphasize depends to a 
great extent on what that theorist wants to 
accomplish. 

If the goal of the new theory is to explain 
how intelligence functions in society, as 
Ackerman and his colleagues wish to do, 
the three-stratum theory is a good starting 
point. If the goal is to probe more deeply 
into the processes of intelligence, especially 
to the point of establishing the biological 
basis for individual differences in cognitive 
power, the g-VPR model may be the best 
starting point. I elaborate on this idea in the 
next chapter. In either case there has to be 
a starting point. 

Attempting to develop theory without 
considering history is simply not a good idea. 
Of the three “new intelligences” reviewed 
here I believe that Gardner has disregarded 
the historical record, to the considerable 
detriment of his ideas. Sternberg has built 
on previous work. Two of his ideas, the 
need to study tacit knowledge and the need 
to incorporate research on creativity into 
research on intelligence, are clearly a step 
forward. Ackerman and his colleagues have 
built on previous research findings. Their 
work points a way toward a better under¬ 
standing of intelligence in everyday life. 


TAKING INTELLIGENCE BEYOND PSYCHOMETRICS 


1 39 


While these new theorists have not pro¬ 
duced any revolutionary new paradigms, 
they have outlined an agenda for the future. 
It turns out that it can be summarized 
by an e-mail message that I received just 
as I finished writing this chapter. The 
message, which was intended as a joke, con¬ 
trasted examinations in engineering and psy¬ 
chology. 

The question for psychology was: 

Based on your knowledge of their works, 
evaluate the emotional stability, degree 
of adjustment, and repressed frustrations 
of the following: Alexander of Aphro- 
disias, Ramses II, Gregory of Nicia, 
Hammurabi. Support your evaluation 
with quotations from each man's work, 
making appropriate references. It is not 
necessary to translate. 

The question for engineering was: 

The disassembled parts of a high- 
powered rifle have been placed on your 
desk. You will also find an instruction 
manual, printed in Swahili. In ten 
minutes a hungry Bengal tiger will enter 
the room. Take whatever action you feel 
appropriate . Be prepared to justify your 
decision. 

Leaving aside the humor, these questions 
illustrate some important principles for the 
design of new assessment methods. 

The question about psychology is humor¬ 
ous because answering it requires retrieving 
and rearranging knowledge. The task itself, 
writing a report, is a natural one. Doing so 
requires the exercise of important cognitive 
skills that, by their nature, take time to 
execute. Such skills cannot be evaluated 
within the testing paradigm. Nevertheless, 
they are very important aspects of cognitive 
competence. 


What about the engineering question? 
Given proper training, assembling a rifle is 
not difficult. Soldiers and marines in mod¬ 
ern armies all learn to do this. During learn¬ 
ing the trick is to be able to visualize how 
the various parts should be moved to fit 
together. This is an exercise in visual rota¬ 
tion, the R dimension in the g-VPR model 
and the second-order Gv trait in the three- 
stratum model. 

What we have here is a test of domain 
knowledge, Gk in Ackerman’s expositions, 
and of visual-spatial skills. 

The problem is the tiger, not the rifle. 
The task must be done in a hurry ( upesi in 
Swahili). Interfering thoughts of tigers must 
be suppressed. The examinee has to manage 
his or her emotions in a way that facilitates 
the cognitive task rather than paralyzing it. 

A similar problem occurs during knowl¬ 
edge acquisition. Military boot camp, where 
you learn to assemble a rifle, is (intention¬ 
ally) an emotional experience. Learning is 
possible only if motivation can be main¬ 
tained and emotions can be suppressed. 
In boot camp the emotions to be sup¬ 
pressed may be fear and confusion. In school 
and the workplace different emotions are 
involved (boredom? self-consciousness?), 
but the point is the same. Understanding 
the development of intelligence as knowl¬ 
edge requires an understanding of how intel¬ 
ligence interacts with motivation, personal¬ 
ity, and situational constraints to produce 
understanding. 

The present attempts to expand intel¬ 
ligence only scratch the surface of these 
issues. Too much effort has been spent try¬ 
ing to claim revolutionary new advances or, 
even more destructively, setting up opposi¬ 
tions between cognitive abilities, personality 
characteristics, and environmental variables. 
Past cognitive abilities, personality charac¬ 
teristics, and interests interact to increase 
intelligence. Progress depends on our devel¬ 
oping an agenda for research that will 
describe the interaction. 


CHAPTER 6 


The Mechanics of Intelligence 


I believe I have an unfair edge over most of 
my colleagues right now - my mind works 
better than my mouth does. 

U.S. Senator Tim Johnson, from 
a speech in Sioux Falls, South 
Dakota, announcing his intention 
to return to the Senate following a 
brain hemorrhage that left his 
speech impaired, August 28, 2007 

Psychometric theories use what Sternberg 
has called a geometric analogy for intel¬ 
ligence; people are seen as varying along 
dimensions of intelligence in much the same 
way that they vary along the dimensions of 
height and weight. 1 Different theories iden¬ 
tify different dimensions, but the geometric 
analogy is maintained. This is a useful way of 
summarizing variations in intelligence across 
populations, but it has a serious shortcom¬ 
ing. The geometric analogy does not explain 
the processes that make up thinking. 

To see what this means, imagine two 
individuals, Ignatz and Horatio. We first 

1 Sternberg, 1990. 


determine their psychometric intelligence, 
in terms of the g-VPR model, and then ask 
them to attack the following two problems. 
The first problem makes use of the English 
rules that permit center embedding, putting 
one relative clause inside another. Ignatz and 
Horatio are presented with sentences of the 
form 

The rat ate the cheese 
The rat the cat chased ate the cheese 
The rat the cat the dog scared chased ate 
the cheese 

The rat the cat the dog the man owned 
scared chased ate the cheese. 

And so on 

and we determine at what point each person 
finds the sentence incomprehensible. 

The second problem is one of those 
(in)famous mathematics word problems: 

The distance between Seattle and San 
Francisco is approximately 750 miles by 
air. If an aircraft leaves San Francisco at 
9 a.m. and travels toward Seattle at 500 
miles per hour at an altitude of 37,000 feet , 
while at the same time an aircraft leaves 
Seattle and travels to San Francisco at 
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55 o miles per hour at an altitude of 34,000 

feet, when will they pass by each other, and 

how far is this point from Seattle? 

To complete the example, assume that 
we had conducted an experiment, not 
involving Ignatz and Horatio, in which prob¬ 
lems like these were included in a large 
psychometric study using the tests covered 
by the g-VPR model. If we knew Ignatz’s 
and Horatio’s scores on the g-VPR dimen¬ 
sions, we could determine the probability 
that each of them would correctly solve the 
two problems. We could not go beyond 
this because the psychometric approach 
does not provide models of the problem¬ 
solving process. Psychometrics does not tell 
us what the elementary steps are in the 
problem-solving process, or how individual 
differences in the ability to execute these 
steps translate into individual differences in 
problem-solving ability, including the abil¬ 
ity to take intelligence tests. To go further 
we have to look at another branch of psy¬ 
chology: cognition. 

6.1. The Cognitive Psychology 
Approach 

Suppose that we were to ask a modern cog¬ 
nitive psychologist to explain Ignatz’s and 
Horatio’s problem-solving behavior. The 
cognitive psychologist would want to look 
at the requirements of the problem-solving 
task. The psychologist would first observe 
that Ignatz and Horatio would have to be 
able to retrieve word meanings. This, in 
itself, is not a trivial task. Then there is more. 
In order to untangle center-embedded sen¬ 
tences a person has to have some way of stor¬ 
ing information about a noun phrase until it 
can be connected to its verb. To understand 

The rat the cat chased ate the cheese 

Ignatz and Horatio would have to have some 
way of temporarily storing the phrase the 
rat, processing the cat chased, attaching the 
result of the processing to the rat (i.e., 
the particular rat that the cat chased], and 
connecting the modified idea to the verb 


phrase ate the cheese. The cognitive psychol¬ 
ogist would want to know what temporary 
storage capacities Ignatz and Horatio had 
for holding unresolved noun phrases, and 
whether they could organize that storage so 
that the noun phrases would be available 
when their verbs were encountered. 

Turning to the mathematics word prob¬ 
lem, the cognitive psychologist would first 
observe that Ignatz and Horatio have to 
retrieve word meanings, and have sufficient 
temporary storage space in memory so that 
they can analyze the meanings of the sen¬ 
tences. However, the passage does not con¬ 
tain deeply center-embedded sentences, so 
sentence comprehension itself would not 
present a challenge to their capacity for tem¬ 
porary memory storage. There would be 
another challenge. 

Using information extracted from the 
sentences, Ignatz and Horatio would have 
to develop an internal representation of the 
problem. This could either be in the form 
of equations or in the form of a ‘picture in 
the head.” The cognitive psychologist would 
probably remark that both representations 
require internal storage of information, but 
differ in the kinds of information stored. A 
representation in terms of equations is a 
symbolic representation that at least approx¬ 
imates syntactical analysis. A maplike rep¬ 
resentation requires a visual-spatial mem¬ 
ory store. The cognitive psychologist would 
also note that the problem requires answer¬ 
ing two questions, the time the two aircraft 
meet and the distance from Seattle. The 
problem solver has to set up a goal-subgoal 
structure to decide which problem to attack 
first, work on that problem, and then switch 
to the second problem. The cognitive psy¬ 
chologist would like to have an estimate of 
how many goals and subgoals Ignatz and 
Horatio can keep track of, and how long it 
takes them to switch from working on one 
goal to working on another. 

Stepping back from the particulars of the 
two problems, cognitive psychologists offer 
explanations of behavior in terms of elemen¬ 
tary cognitive tasks (ECTs), such as retrieving 
the meaning of a letter or word, holding a 
piece of information in temporary storage, 
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and switching attention from one cogni¬ 
tive task to another. Information-processing 
theories of intelligence attempt to relate 
individual differences in intelligence, in the 
broad sense of the ability to function in the 
world, to individual differences in the exe¬ 
cution of elementary cognitive tasks. 

The cognitive psychologist stresses that 
such explanations are bound to be incom¬ 
plete, because problem-solving ability will 
be determined both by the information the 
person has (knowledge) and how well he 
or she can manipulate information, in the 
abstract. An information-processing model 
addresses only the second question. Incom¬ 
plete as it is, it is an important question 
because information-processing capacities 
stand somewhere between psychometric 
models of intelligence and brain-based 
explanations of intelligence. 

To see this, return to the example of the 
center-embedded sentence. Knowing where 
Ignatz stands in the geographic space defined 
by the g-VPR model tells us the probability 
that he will be able to understand a sentence 
with, say, three levels of embedding. Spec¬ 
ulating a bit, based on the discussion of the 
brain to be presented in the next chapter, 
we might observe that when Ignatz tries to 
understand a center-embedded sentence the 
frontal, anterior parietal, and left temporal 
parts of his brain show heightened activity. 
Horatio might be located at a different point 
in the g-VPR space and have a different 
probability of understanding the sentence, 
and while trying to understand the sentence 
Horatio might show different levels of acti¬ 
vation than Ignatz does. It is useful, even 
necessary, to have an explanation that stands 
somewhere between the psychometric and 
brain-based explanations, by explaining in 
functional terms how Ignatz and Horatio 
are limited by their information processing 
abilities. 

6 . 1 . 1 . An Historical Note 

The idea that individual differences in cog¬ 
nitive power are partially due to underly¬ 
ing differences in the ability to perform ele¬ 
mentary cognitive tasks is not new. In the 


mid nineteenth century Galton measured 
the speed with which people could per¬ 
form simple tasks, such as making a sim¬ 
ple movement in response to an auditory or 
visual signal, and attempted to find a cor¬ 
relation between reaction time and occupa¬ 
tional status. The results were not impres¬ 
sive, so he apparently dropped this line of 
investigation. In 1901 Clark Wissler, a grad¬ 
uate student working under the supervision 
of James McK. Cattell, attempted to find a 
relation between tasks similar to those Gal¬ 
ton had used and the grades of Columbia 
University students. This work was also con¬ 
sidered to be unsuccessful, and for many 
years the “revealed wisdom” was that there 
is no correlation between intelligence and 
performance on measures of information 
processing. 2 

In retrospect the conclusion was pre¬ 
mature. Galton and Wissler both expected 
to find very strong relationships between 
performance on elementary cognitive tasks 
(ECTs) and complex thought. Today we 
know that the relations are there, but 
that they are moderate. Both Galton's and 
Wissler’s techniques for measuring reaction 
times were not close to modern standards. 
Probably the biggest mistake they made was 
that they did not take into consideration 
variation in the execution of an ECT within 
the individual. They would, for instance, 
determine a person’s reaction time by aver¬ 
aging three or four attempts to execute 
the task, whereas today upward of a dozen 
measurements would be taken. (Even today 
cognitive psychologists sometimes criticize 
researchers interested in intelligence for bas¬ 
ing their measurements on too few trials.) 
Wissler’s study had low statistical power, 
which means that it was capable of detecting 
only large relationships. All in all, by today’s 
standards the research was so poorly done as 
to be virtually useless. 3 

The deficiencies in the research were not 
appreciated at the time, so the perceived 
failure of the Galton-Wissler approach 
resulted in a virtual cessation of interest 

2 Jensen, 2006, pp. 5-7. 

3 Deary, 1994. 
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in individual differences in elementary cog¬ 
nitive tasks. Psychometric studies of intel¬ 
ligence flourished, and, especially after 
the mid-1950s, the study of information¬ 
processing models, under the title cognitive 
psychology , became the dominant method 
for studying human thought, without regard 
for individual differences. These fields were 
pursued as separate endeavors, however, 
involving different investigators and pub¬ 
lication in different journals. There were 
a few calls for a rapprochement, notably 
in a paper by the educational psychologist 
Lee Cronbach in 1957. 4 However, very little 
was actually done until 1973, when my col¬ 
leagues Nancy Frost and Clifford Lunneborg 
and I published a series of studies on the 
information-processing correlates of verbal 
and mathematical reasoning. 5 This work was 
followed by a blizzard of studies, certainly 
not limited to my laboratory. 

6.1.2. Modern Information-Processing 
Models 

In order to understand the attempts to 
connect information-processing models of 
thought to intelligence we must understand 
modern theories of cognition as information 
processing. These theories owe a great deal 
to concepts developed for designing digital 
computing systems. However, the modern 
view in no way implies that the digital com¬ 
puter can be used as a model of the mind. 
Rather, the contention is that any compu¬ 
tation, including thought, has to rely on 
certain elementary actions, and that study¬ 
ing performance on the individual actions 
is a useful way to proceed when studying 
cognition. 

Any computing device much more com¬ 
plicated than a light switch has to be 
able to do three things: sense the envi¬ 
ronment (perception], classify the environ¬ 
ment into states relevant to the device (cat¬ 
egorization], and relate these classifications 
to previously stored information (memory 
retrieval]. The result of these computations 

4 Cronbach, 1957. 

5 Hunt, Frost, & Lunneborg, 1973. 
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is an internal representation of the current sit¬ 
uation as interpreted in the light of memory 
of past situations. The internal representa¬ 
tion may then be used to select a response 
(decision making]. If the computing device 
operates continuously, in a dynamic, chang¬ 
ing environment, it has to be able to main¬ 
tain a record of the situations it has encoun¬ 
tered (as interpreted], the responses that 
have been made, and the results of the 
responses. Therefore, we have to be con¬ 
cerned with the storage and retrieval of 
information in long-term memory, in a way 
that ensures its accessibility when needed. 

Although one can conceive of a robot 
mind that executed each of the four tasks, 
perception through decision making, in 
serial order, in the human brain they are 
interleaved, with a great deal of feed¬ 
back between them. A model of how this 
exchange of information takes place is called 
a cognitive architecture . Psychologists who 
study human information processing have 
converged on some version of the Black¬ 
board model of cognitive architecture. 6 A 
version of this model is shown in Figure 6.1. 
The components of the model are as follows: 

1. Information from the environment is 
sensed, and then classified into progres¬ 
sively higher-order categories, through 
arousal of related information in long¬ 
term memory. Long-term memory is 
not thought of as a static storage pro¬ 
cess, as is the case in computer systems. 
Items in long-term memory exist in vari¬ 
ous states of activation, depending upon 
how frequently and how recently they 
and related pieces of information have 
been attended to. Therefore, the inter¬ 
pretation of a percept will be influenced 
by the context in which it occurs. 

2. The information from the environ¬ 
ment, together with relevant informa¬ 
tion retrieved from long-term memory, 
is placed in working memory, a term 

6 Anderson, 1996; Hunt & Lansman, 1986; Meyer & 
Kieras, 1997; Newell, 1990. These references sum¬ 
marize the ideas that the various authors had put 
forward over a number of years. I certainly do not 
claim priority. 
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Figure 6.1. The blackboard model of cognition. Information about 
the current situation is held in a temporary working memory store. 
Sensory input enters working memory if attention is being directed 
to the appropriate input channel. Information on the blackboard and 
from the sensory input channel activates problem-solving procedures 
that have been stored in long-term memory. These place further 
information in working memory. Sensory information may also enter 
long-term memory directly, thus lowering the threshold required to 
activate some problem-solving procedures or, in some cases, 
initiating an action without placing information on the 
blackboard. 


introduced by the British psychologist 
Alan Baddeley, 7 and widely accepted 
today. There further processing results 
in an internal representation of the cur¬ 
rent situation, as interpreted. The 
internal representation may include 
interpretations in contexts, and the 
identification of goals and subgoals in 
problem-solving. In totality, the pro¬ 
cesses acting on the internal representa¬ 
tion are referred to as executive processes , 
whose role is to update the internal 
representation and establish priorities 
for action. One of the most important 
subsets of the executive processes are 
the attentional processes that highlight 
some pieces of information and suppress 
others. 

3. Organizing information in this way 
enables a person to appear to be deal¬ 
ing with two or more tasks simulta¬ 
neously - for example, talking while 
driving an automobile. The impres¬ 
sion of simultaneity is something of 
an illusion, as both tasks will compete 

7 Baddeley, 1986. 


for space in working memory and for 
attention. When two tasks are done 
together, therefore, close examination 
almost always shows that one or both 
of the tasks are performed less well than 
they would be if performed alone. 

4. Working memory acts as a “blackboard” 
that broadcasts information into long¬ 
term memory. Alternatively, you can 
think of long-term memory processes as 
actively watching working memory to 
see if their cue has been called, rather 
like actors at a stage production. 

5. The storage section of working mem¬ 
ory is thought of as being divided into 
modality-specific sections for linguis¬ 
tic, auditory-nonlinguistic, and spatial/ 
visual information. There is no implica¬ 
tion that working memory is located at 
a single place in the brain. It is a system 
resulting from the integration of pro¬ 
cessing at several different locations. 

6. Information in working memory, as 
interpreted, is consolidated in long-term 
memory. This takes time. Therefore, 
the, probability that a piece of informa¬ 
tion in working memory will be stored 
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in long-term memory is partially deter¬ 
mined by the time it remains in working 
memory. Items briefly attended to are 
not likely to be remembered. 

7. From time to time the information in 
working memory will initiate a process 
that calls for a motor response - speak¬ 
ing or making a physical movement. 
While many information-processing 
psychologists have been quite con¬ 
cerned with response production, 8 psy¬ 
chologists interested in intelligence have 
done rather little work on this aspect of 
cognition. 

Cognitive psychologists are concerned 
with the typical characteristics of each of 
the processes in the cognitive architecture. 
Psychologists interested in intelligence want 
to know how individual differences in these 
characteristics are related to individual dif¬ 
ferences in intelligence, as measured by psy¬ 
chometric models. Using the g-VPR model 
as a guide, we will first look at two aspects 
of the architecture that appear to be related 
to individual differences in general rea¬ 
soning, and then examine specialized pro¬ 
cesses related to verbal and visual-spatial 
reasoning. 

6.2. The Speed of Mental Processing 

The blackboard model depicts thinking as 
the shuttling of attention from one piece of 
information to another. To illustrate this, 
consider the common task of determining 
whether a view of a scene matches a descrip¬ 
tion. The task could be as simple as answer¬ 
ing the question Are my car keys on the dining 
room table? The listener must form an inter¬ 
nal representation of the linguistic state¬ 
ment, form an internal representation of a 
visual scene, and compare the two. Different 
people use different strategies for comparing 
sentences to pictures, 9 but no matter which 
strategy they use, they have to shift back and 

8 For instance, Meyer & Kieras, 1997. 

9 MacLeod, Hunt, & Mathews, 1978. 
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forth from processing one piece of informa¬ 
tion to another. 

All this has to happen quickly enough 
that the thinker can keep up with the envi¬ 
ronment. If a speaker is speaking at a normal 
conversational tempo, listeners are expected 
to keep up. See the example and elaboration 
in panel 6.1. 

There is also an internal demand for speed 
in mental processing. Competing lines of 
thought vie for the limited storage space 
in working memory, and the competition 
is fierce. Have you ever been introduced to 
someone, immediately embarked on a con¬ 
versation, and then realized that you did 
not know the person’s name, even though 
the two of you had just been introduced? If 
information in working memory is usurped 
by another task, the original contents of 
working memory may be lost before it can 
be moved to long-term memory. 

These considerations show why speed of 
processing should be important in thought. 
But is there a single speed of processing, or 
different speeds for different ECTs? 

The reason for believing in processing 
speed as a pervasive trait is that all men¬ 
tal actions depend on neural processing. If 
people differ in the effectiveness of pro¬ 
cessing speed at the neural level, then there 
should be pervasive individual differences in 
processing speed no matter what the task. 10 
Accordingly, speed of processing should be 
positively correlated with measures of g, the 
general reasoning factor. In order to test this 
hypothesis we have to find a way of measur¬ 
ing mental processing speed. 

6.2.1. Donders’s Paradigm as a Way of 
Measuring the Speed of Mental Processing 

Elementary cognitive processes cannot be 
measured in isolation. Even the simplest 
tasks will require multiple mental steps, and 
may mix cognitive and noncognitive pro¬ 
cesses. In the nineteenth century Galton 
asked people to strike a bag upon a sig¬ 
nal. He took the time interval between 

10 Jensen (2006] has developed this argument at con¬ 
siderable length. 
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Panel 6.1. Intelligence and the Speed 
of Language Comprehension 

The speed with which people can com¬ 
prehend simple speech is often taken as 
an indicator of intelligence. I will illus¬ 
trate with an anecdote and an actual 
demonstration. 

The anecdote is based on an inci¬ 
dent in my laboratory. Most of my lab¬ 
oratory research effort at that time was 
devoted to studying individual differ¬ 
ences in cognition in young adults, usu¬ 
ally college students. However, one grad¬ 
uate student was interested in mentally 
disabled children. In order to communi¬ 
cate with the participants in her studies 
she learned to . . . talk . . . very .. . slowly. 
When she reported her results to her fel¬ 
low graduate students she . . . drove... 
them . . . up . .. the. . .wall. 


Recall Baddeley’s three-minute test of 
reasoning, described in Chapter 2.* Peo¬ 
ple were shown a picture and a state¬ 
ment, and asked if the picture described 
the statement. Two examples, one easy 
and one hard, are 

Is the A after the B ? AB 

Is the B not before the A? BA 

The score on this test was the num¬ 
ber of items that could be answered in 
three minutes. It had a .59 correlation 
with scores on a formal intelligence test 
used to screen recruits in the British 
army. Tasks that require coordination 
of linguistic statements and visual infor¬ 
mation emerge as a way of evalua¬ 
ting g. 

* Baddeley, 1968. 


the signal and contact with the bag as a 
measure of speed of processing. A mod¬ 
ern cognitive psychologist would point out 
that this reasoning is flawed. The interval 
between the signal and the strike contains 
many subprocesses: detecting that a signal 
has been received, interpreting it, selecting 
a response, and making the movement. The 
processes of detecting and interpreting the 
signal and constructing the response are cog¬ 
nitive acts, but hand movement is at best 
semicognitive, as the time taken to move 
the hand would depend on properties of the 
muscular-skeletal system as well as the ner¬ 
vous system. Someone with arthritis would 
strike a bag very slowly, but arthritis does 
not affect cognition. Some way has to be 
found to isolate each of the several cognitive 
actions that lie behind an overt response. 

Franciscus Donders, a nineteenth- 
century Dutch physiologist, proposed a 
solution to this problem. Suppose that a 
task can be fractionated into a sequence of k 
steps that must be conducted in sequence. 
Suppose further that you can construct a 
second task that requires all the steps of 


the first task, and an additional k + i st step 
that has to be inserted somewhere in the 
series of steps required by the first task. The 
difference between the time required to 
execute the first task and the time required 
to execute the second task will be a 
measure of the time required to complete 
the inserted step. That is, if we let R(x) be 
the time required to complete process x, 
then 

Reinserted process) = R(task with insertion) 
- R(original task). (6.1] 

This brief explanation does not really do 
justice to Donders’s sophisticated ideas, but 
it is close enough for present purposes. 

Donders’s approach is based on three 
assumptions: that the processes involved 
must be executed in sequence, that for any 
pair of adjacent processes in the sequence 
the first process is not initiated until the sec¬ 
ond is completed, and that the speed with 
which a process is completed is independent 
of the speed with which any other process 
is completed. Collectively these conditions 
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are referred to as the independent serial pro¬ 
cessing assumptions. The literature in cogni¬ 
tive psychology contains many discussions of 
how one can ensure that independent serial 
processing has occurred, for the issue is sub¬ 
tler than you might think. 

Donders’s logic applies only to situations 
in which the entire task is completed suc¬ 
cessfully. The effect of an error in any one 
of the processes is not defined. Therefore, a 
careful investigator should restrict his or her 
analyses to trials on which a correct response 
is made. 

Unfortunately, researchers interested in 
intelligence have not always paid as much 
attention as they should either to the need 
to prove that the independent serial pro¬ 
cessing assumptions apply to their tasks, or 
to restricting their analyses to correct tri¬ 
als. In spite of these limitations, Donders’s 
paradigm has proven to be useful. Panel 6.2 
provides two examples of how Donders’s 
method of subtraction has been used to mea¬ 
sure the time required to access information 
in long-term memory. Panel 6.3 provides an 
example of how it can be used to measure 
the time needed to access information in 
working memory. 

Experimental paradigms have also been 
designed to measure the speed with which 
a person can make an elementary decision, 
without reference to memory. Such studies 
are called choice reaction time (CRT) tasks. 
At the beginning of the experiment the par¬ 
ticipant is told that one of a small number of 
directly perceivable stimuli may occur. For 
instance, the participant might be told that 
either a red or a green light could be pre¬ 
sented. The participant’s task is to identify, 
as quickly as possible, which of the possible 
stimuli has actually been presented. 

Choice reaction tasks differ in the num¬ 
ber of possible stimuli that might be pre¬ 
sented. Discriminating between red and 
green lights is an example of a two-choice 
reaction time task , denoted 2CRT. In a 
more complicated experiment, if the choice 
were to be among red, green, yellow, and 
blue lights, the task would be a 4CRT task. 
In practice experiments seldom go beyond 
the 8CRT task. 


The 2CRT condition represents the time 
required to make a binary decision. On log¬ 
ical grounds, this should be central to cog¬ 
nition. However, the response time in the 
2CRT condition will also include the time 
required to make the response. This contam¬ 
inates cognitive and noncognitive processes. 
Suppose that an examinee had broken a fin¬ 
ger prior to the experiment. Such a person 
would be slow pressing a button, and hence 
have a long reaction time. It would hardly be 
fair to infer that someone’s brain was work¬ 
ing slowly because of a broken finger. 

To avoid such problem we can measure 
the manner in which choice reaction times 
increase with the number of choices. One 
such analysis is described in panel 6.4. As 
the panel explains, an analysis of how reac¬ 
tion time changes as the number of choices is 
increased can provide an estimate of cogni¬ 
tive processing speed, independent of motor 
processes. 

6.2.2. Identification Paradigms 

Donders's approach depends upon the 
assumptions of strict serial processing and 
independence of processes from each other. 
An alternative approach is to construct a task 
that is believed to be primarily responsive to 
a particular cognitive process, and use it as 
a measure of the time required for that pro¬ 
cess, ignoring the fact that other processes 
may also influence the measure. 

One such paradigm is the lexical identifi¬ 
cation task, which is an attempt to measure 
the time required to retrieve information 
from long-term memory. An experimen¬ 
tal participant decides that a letter string 
either does or does not constitute a common 
word, for example, MALEC or CAMEL. 
Reaction time (RT) is measured. A gener¬ 
alization of this task is the semantic identi¬ 
fication task. In semantic identification an 
object name and a category name are pre¬ 
sented, for example, CAMEL ANIMAL. 
The participant indicates whether or not 
the object named is a member of the cate¬ 
gory. In such tasks the reaction time reflects 
both retrieval of information from long-term 
memory and the motor processes involved 
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Panel 6.2. Using Donders's Logic 
to Measure Access to Long-term 
Memory 

The following two examples show how 
Donders’s paradigm can be used to mea¬ 
sure the speed of access to long-term 
memory. 

Example 1. Category identification 

Task 1. The examinee sits in front of 
a computer monitor. A program presents 
single digits on a screen. The examinee is 
to depress the space bar as soon as a digit 
is presented. 

Task 2. The procedure is identical, 
except that the space bar is to be 
depressed only if an odd digit is pre¬ 
sented. 

The first task can be conceptualized 
as 

(1) Number presented. (2) Recognize 
number. (3) Press space bar. Let the time 
required to do this be RT(i). 

The second task can be conceptualized 
as 

(1) Number presented. (2] Recognize 
number. (3) Classify number and select 
response. (4) Press space bar if appropri¬ 
ate. Let the time required to do this be 
RT(2). 

Applying Donders’s logic, the time 
required to make a simple decision is 

Time to classify stimulus and select 
response = RT(2] — RT(i}. 

Example 2. Name identification and 
physical identification 

This task was originally developed by 
Michael Posner and his colleagues at the 
University of Oregon.* Variants of it have 
been used widely in studies in cognitive 
psychology. 

The examinee is shown two letters on 
a computer screen and is asked to indi¬ 
cate, as rapidly as possible, whether or 


not the two letters are identical. Possible 
letter pairs are 
A A 
A B 
A a. 

The answer for these three pairs 
depends upon what is meant by “iden¬ 
tical.” Does this mean that they are phys¬ 
ically identical, or does it mean that they 
name the same letter? For the first pair the 
answer is “yes” in each case, for physi¬ 
cally identical objects must have the same 
name. For the second pair the answer is 
“no.” For the third pair, A a, the answer 
depends upon the instructions; the let¬ 
ters are physically different but refer to 
the same letter of the alphabet. 

Now suppose that identification takes 
place in the following steps. 

Physically identical (PI) steps 

Pi. Determine whether the two letters 
are physically identical. 

P2. If they are, respond “yes”; otherwise 
respond “no.” 

Name identical (NI) steps 

Ni. Determine whether the two letters 
are physically identical. 

N2. If they are, respond “yes”; otherwise 
go to step N3. 

N3. Retrieve letter names. 

Nq. Determine whether the two letters 
have the same name. 

N5. If they do, respond “yes”; otherwise 
respond “no.” 

In order to perform under name- 
identical instructions, the observer has 
to retrieve the names of letter forms, 
an act involving retrieval of information 
from long-term memory. This takes time. 
College students take about eighty mil¬ 
liseconds longer to execute the identi¬ 
fication task under the NI than under 
the PI condition. Thus eighty millisec¬ 
onds can be taken as a rough estimate 
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of the time required to access highly 
overlearned information in long-term 
memory. 

The eighty milliseconds is the average 
identification time taken by college stu¬ 
dents in general. If college students are 
split into two groups, one with high ver¬ 
bal comprehension test scores and one 
with low test scores, students in the high 


test score group take about sixty mil¬ 
liseconds to deal with the NI condition, 
compared to the Cl condition, and stu¬ 
dents in the low test score group take 
about ninety milliseconds. 1 

* Posner et at, 1969. 

Hunt, Frost, & Lunneborg, 1973; Hunt, 
Lunneborg, & Lewis, 1975. 


Panel 6.3. The Short-term Memory 
Scanning Task 

The short-term memory scanning task 
was developed in the 1960s by Saul 
Sternberg, then a researcher at the 
(now disbanded] Bell Laboratories of 
the American Telephone and Telegraph 
Company.* Since then the paradigm has 
been widely used by many researchers. 
The procedure is as follows: 

A memory set of from one to six digits 
or letters is presented - for example, 

A Q X L. 

The memory set is removed and a 
probe item is presented - for exam- 
pie, T. 

The examinee indicates, by pressing a 
button, whether or not the probe was 
a member of the memory set. In this 
case the answer is NO. 

The procedure is then repeated, with 
a new memory set. 

The scoring procedure illustrates the 
use of mathematical modeling. Let k be 
the number of items in the memory set, 
and let RT be the reaction time, that is, 
the time between the onset of the probe 
item and the point at which the exami¬ 
nee makes a response. We consider only 
correct trials. It is considered important 


to train participants until they make very 
few errors. Otherwise, individual differ¬ 
ences in trading off speed and accuracy 
will influence the results. Unfortunately, 
this caution has not always been followed 
in research on intelligence. Empirically 
it has been found that after individuals 
have received enough training that they 
make errors on only about 5% of the trials, 
RT increases linearly with the number of 
items in the memory set: 

RT — A -\- Bk, 

where A and B are positive constants. 
The B parameter can be thought of as the 
increment in time required to scan the 
memory set when memory set size is 
increased by adding one item. This inter¬ 
pretation makes it a reasonable measure 
of the speed of access to information in 
short-term memory. The A parameter, 
by contrast, reflects all of the processes 
involved, including identifying the probe 
item and making the response. 

Sternberg’s analysis of memory scan¬ 
ning is a variant of Donders's method, 
because the estimate of the B parameter 
is sensitive to the successive differences 
in the times required for memory sets of 
one versus two, two versus three, and so 
forth. 

* S. Sternberg, 1966. 
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Panel 6.4. Jensen’s Choice Reaction 
Time Procedure 

Arthur Jensen developed a choice reac¬ 
tion time procedure that is widely used in 
the study of individual differences.* The 
examinee is seated in front of a display 
that consists of (a) a bank of eight lights, 
with buttons immediately below them, 
and (b) a “home” button in the center of 
the display. Depending on the condition, 
the examinee is told that any one of two, 
four, six, or eight lights may be lit. When 
one of the lights comes on, the examinee 
lifts his or her finger from the home but¬ 
ton and depresses the button under the 
light that has been lit. 

This procedure yields two measures 
on each trial. The decision time is the 
interval between the time the light is 
illuminated and the time the examinee 
releases the home button. The movement 
time is the interval between the release of 
the home button and the depression of 
the button under the illuminated lamp. 
The idea behind measuring each inter¬ 
val is that the decision where to move 
is made before the examinee's hand is 
lifted. Therefore, decision time reflects 
the time required by the cognitive pro¬ 
cess of selecting a response. Movement 
time reflects a motor movement. Accord¬ 
ingly, decision time should be a bet¬ 
ter measurement of cognitive processing 
speed than movement time. The analysis 


assumes that the independent serial pro¬ 
cessing assumptions are true. 

When people are asked to choose 
between k alternatives using an appara¬ 
tus like Jensen’s, after an initial practice 
period , their reaction time is a linear func¬ 
tion of the logarithm of the number of 
choices, 

RT[k) = A+B ln(k) t 

where A and B are positive real num¬ 
bers. This is an interesting observation 
in itself, because it is the function that 
would describe the operation of a deci¬ 
sion maker who was choosing between 
alternatives by making an optimally effi¬ 
cient sequence of binary choices between 
subsets of the alternatives. Therefore, B , 
the slope parameter, can be thought of as 
an estimate of the time it takes to decide 
which of two equally probable alterna¬ 
tives has occurred. The A parameter, by 
contrast, reflects general speed of pro¬ 
cessing, including movement time. 

Both Sternberg's task and Jensen’s task 
are based on Donders’s reasoning. In each 
case the argument is that reaction times 
increase regularly with increases in the 
complexity of the decision process. The 
increase in time should be smaller for 
more intelligent people. 

* Jensen, 2006. This reference summarizes over 

twenty-five years of research using the device. 


in making a response. Because the reaction 
times obtained in identification tasks mix 
times for cognitive and motor processes, 
the times themselves are not easily inter¬ 
pretable. Somewhat surprisingly, though, 
the way these tasks are used avoids the inter¬ 
pretation process. 

In cognitive psychology, without regard 
to individual differences, the typical exper¬ 
iment focuses on how experimental con¬ 
ditions change the reaction times in an 
identification task. For instance, one of the 
major uses of the lexical identification task 


is to study how observation of semantically 
related items changes the speed of identifi¬ 
cation of a target item. To take a frequently 
cited example, the word NURSE will be rec¬ 
ognized faster if the immediately preceding 
word was DOCTOR than it will if it was 
BUTTER. 

A similar logic applies to studies of indi¬ 
vidual differences. When we study the rela¬ 
tion of elementary cognitive processes to 
intelligence, we are interested in the dif¬ 
ferences in reaction times between people 
who have been indexed by their scores on 
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intelligence tests, or by some other measure 
of intelligence. The correlation coefficient 
is a measure of these differences; if people 
who have high intelligence scores are con¬ 
sistently faster in lexical identification tasks 
than people who have low scores, then this 
is evidence that at least one of the processes 
required for lexical or semantic identifica¬ 
tion is associated with intelligence. And this 
is the case. In the population of college stu¬ 
dents lexical identification RT has a corre¬ 
lation of —.40 with scores on tests of ver¬ 
bal intelligence. 11 The negative relation is 
expected because short reaction times are 
associated with high test scores. The magni¬ 
tude of the correlation in a more hetero¬ 
geneous population than college students 
would presumably be higher. 

Nevertheless, it would be nice to obtain 
as direct a measure as possible of cognitive 
processing speed. To see this, consider some 
facts obtained from general experimental 
psychology. In the spirit of Donders, we can 
think of any cognitive process as consisting 
of the following steps: 

Perceive the stimulus -> Select a response 
—> Make a response. 

The middle step is what we normally 
think of as cognition; the first step reflects 
the sensory-perceptual apparatus, and the 
third step reflects motor processes. 

In a modern information-processing 
study the participant reacts to information 
presented on a computer screen by pressing 
one of the keys on the keyboard. Studies in 
perception suggest that it takes from twenty 
to fifty milliseconds or less to detect a sim¬ 
ple visual figure, depending on the illumina¬ 
tion and the complexity of the surrounding 
visual field. It takes a healthy college stu¬ 
dent about 250 milliseconds to press a but¬ 
ton indicating detection of a visual signal, in 
a situation in which the same button is to be 
pressed regardless of the identity of the stim¬ 
ulus. This condition minimizes the middle, 
cognitive step in the sequence. If a cognitive 
task is introduced, such as asking the student 

11 Palmer et al., 1985. 
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to recognize whether or not CAMEL is a 
word, reaction time rises to 500-600 mil¬ 
liseconds. This means that of the roughly 
550 milliseconds taken for the decision task 
somewhere between 250-300 milliseconds, 
about half the response time, will be taken 
up by perceptual and motor processes. The 
other half is taken up by cognitive processes. 
Therefore, the reaction time provides a mea¬ 
sure of cognitive processing speed, albeit a 
contaminated one. 

This analysis ignores an important issue. 
In general, the longer a person waits to make 
a response in a choice task, the more likely 
they are to make the correct choice. This is 
called the speed-accuracy trade-off. Different 
people adopt different criteria for accuracy; 
one person may be satisfied if his or her 
responses are correct on 95% of the trials, 
others may prefer 98%, a third may accept 
92%. Studies of the speed-accuracy tradeoff 
have shown that above accuracy rates of 90% 
small differences in accuracy may be asso¬ 
ciated with substantial differences in reac¬ 
tion time. Therefore, if different participants 
adopt different criteria for acceptable accu¬ 
racy another source of variation has been 
introduced, further confusing the interpre¬ 
tation of correlations between CRT and IQ 
scores. 

6.2.3. The Inspection Time Paradigm 

Most of the problems that arise in inter¬ 
preting CRTs arise because a reaction time 
study lumps cognitive and motor process¬ 
ing together. These problems are avoided 
in the inspection time task. In the typical 
visual inspection time task two vertical lines 
are presented for a brief time, followed by 
a visual mask. The interval between the 
onset of the two lines and the onset of the 
mask is called the inspection time. Figure 6.2 
illustrates the procedure. The participant 
then indicates whether the line on the right 
was longer than the one on the left or vice 
versa. The experimenter adjusts the inspec¬ 
tion time until it is as short as it can be, 
while allowing the observer to maintain a 
fixed level of accuracy. This is usually set at 
75% correct responding. 
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Figure 6.2. The inspection time paradigm. Two 
lines of unequal length are presented at time A, 
and displayed until time B. At time B the lines 
are replaced by a visual mask blanking out the 
stimulus. The observer then indicates which 
vertical line was longer. The time interval A-B is 
the inspection time The experimenter adjusts 
the inspection time until the shortest time 
interval is found at which the observer can 
maintain a fixed level of accuracy, usually 75 to 
80% correct. 


Inspection times vary substantially, 
depending upon the apparatus, the transient 
illumination, and the age and intelligence 
of the participants. Some young adults can 
maintain accuracy at an inspection time of 
30 milliseconds. Senior citizens may pro¬ 
duce inspection times of 80 to 90 millisec¬ 
onds. Within age groups and experimental 
conditions inspection times are negatively 
correlated with intelligence, that is, higher 
test scores are associated with lower reac¬ 
tion times. Correlations in the —.3 to —.4 
range are typical. Various statistical correc¬ 
tions may raise this figure to —.6 to —.7 
The inspection time paradigm avoids 
mixing cognitive process and motor 
responding. However, it does so at the 
expense of minimizing the cognitive pro¬ 
cesses involved, for the task is largely a per¬ 
ceptual one. Therefore, it is not surprising 
to find that when inspection time measures 
are included in an analysis of an intelli¬ 
gence test battery, the inspection time mea¬ 
sures load on a “cognitive speediness” factor. 
This is a broad second-order factor in the 
three-stratum model. It has been suggested 
that if we broaden the study beyond analy¬ 
sis of results from young adults, inspection 
time might also indicate an ability to control 
attention. 12 


12 Nettlebeck, 2001. 


6.2.4. The Relation between Processing 
Speed and Intelligence 

The correlations between intelligence test 
scores and various reaction and inspection 
time measures are typically in the —.2 to —.4 
range, which is enough to show that there 
is a common core to the test scores and 
information-processing measures, but small 
enough to show that speed of mental pro¬ 
cessing and psychometric intelligence are 
not identical. If the correlations are cor¬ 
rected for various mediating effects, such as 
the restricted range of intelligence test scores 
found in university undergraduates, the cor¬ 
relations may climb to the —.5 to —.7 range, 
but the points about overlap and noniden¬ 
tity remain. 

This is reasonable. Early in the mod¬ 
ern study of individual differences on 
information-processing tasks my colleagues 
Nancy Frost and Clifford Lunneborg and 
I pointed out that the study of individ¬ 
ual differences in information processing is 
quite different from the study of psychome- 
trically defined intelligence. 13 We saw the 
two as related, but not identical, enterprises. 
Therefore, we pointed out that the success 
of the endeavor would come from combin¬ 
ing understandings of individual difference 
in intelligence, conceptually defined, from the 
two different perspectives of cognitive psy¬ 
chology and psychometric psychology. 

The testing procedures required to eval¬ 
uate elementary cognitive tasks are time- 
consuming and expensive. While there are a 
few laboratories where banks of computer- 
controlled test stations have been built, for 
the most part this research takes place using 
participant groups that range from a single 
individual to perhaps half a dozen people. 
Partly for this reason most of the research 
on individual differences in elementary cog¬ 
nitive tasks has been conducted using uni¬ 
versity undergraduates as participants. 

The fact that so many studies are con¬ 
ducted on populations of young adults 
has led to contradictory results and some 

13 Hunt, Frost, & Lunneborg, 1973. 
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confusion in the literature. For instance ; in a 
very well-done study of Spanish undergrad¬ 
uates; Roberto Colom and his colleagues 
found no reliable relationship between 
measures of information-processing speed 
and intelligence test scores. When they 
extended their study population to include 
high school students, a reliable relation¬ 
ship appeared. This study is notable because 
exactly the same procedures were used in 
each experiment. 14 An American experi¬ 
ment that contrasted information process¬ 
ing in university undergraduates and high- 
level mentally retarded patients found that 
both the evidence for a pervasive speed fac¬ 
tor and the relation of that factor to intel¬ 
ligence test scores were much stronger in 
the mentally challenged group than in the 
university undergraduates. 15 In perhaps the 
largest study done to date, a Scottish repre¬ 
sentative survey of people from their late 
teenage years to their sixties found cor¬ 
relations between reaction times and per¬ 
formance on a complex arithmetic task in 
the -.3 to -.4 range. 16 

The potential size of the effects achieved 
by expanding the study of the information 
processing-intelligence relationship beyond 
the ubiquitous college population should 
not be underestimated. If we compare “fast” 
populations, such as college students, to 
“slower” populations, such as the elderly or 
mildly mentally retarded individuals, esti¬ 
mates of processing speed may vary by any¬ 
where from 50% to a factor of five or more, 
depending upon the apparatus and parame¬ 
ter involved. 17 

There are also a number of other rea¬ 
sons for believing that current studies, done 
mostly on undergraduates, underestimate 
the strength of association between mea¬ 
sures of speed of information processing and 
intelligence test scores. A few of them are 
discussed in panel 6.5. 

14 Colom et al., 2008. See particularly the contrast 

between the results of experiment 1 compared to 

experiments 2 and 3. 

15 Detterman & Daniel, 1989. 

16 Deary & Der, 2005. 

17 Ibid.; Hunt, 1980, 1987; Salthouse, 1996. 


What are we to make of the moderate 
but nontrivial relation between mental pro¬ 
cessing speed and psychometric measures of 
intelligence? Two extreme positions could 
be taken. One is that the various elemen¬ 
tary cognitive tasks evaluate the speeds of 
different mental processes, each of which 
contributes to cognition. Under this view, 
for instance, the lexical identification task 
and the inspection time task are measuring 
different things. Therefore, each of the pro¬ 
cesses evaluated should make its own contri¬ 
bution to the prediction of performance on 
complex reasoning tasks, such as are present 
on intelligence tests. Another implication of 
this view is that any measure of processing 
speed that contains a substantial motor com¬ 
ponent should have markedly lower corre¬ 
lations with complex reasoning tasks than 
would measures associated largely with cog¬ 
nitive processes. 

At the other extreme, it might be that 
all of the different information-processing 
measures reflect a fundamental speediness 
property of a person’s nervous system. If 
this is the case, all the different information¬ 
processing measures would be regarded as 
measures of the same pervasive property 
and would have similar correlations with test 
scores. 

The truth seems to lie somewhere in 
between. In general, overall measures of 
performance, such as the mean CRT in 
Jensen’s paradigm, tend to have somewhat 
higher correlations with scores on intelli¬ 
gence tests and complex reasoning tasks 
than do derived measures of basic process¬ 
ing, such as the slope parameter, although 
the latter measures should, in theory, eval¬ 
uate a specific cognitive process. In fact, 
the most important variable seems to be 
the total time required to complete the 
information-processing task. The longer the 
time required, the higher the correlation 
between performance on the task and on 
an intelligence test. 18 This is what would be 
expected on the assumption that all mea¬ 
sures tap a single mental speed trait, for then 

18 Jensen, 2006. 
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Panel 6.5. Practical Limits that 
Obscure the Relation between 
Information-processing Speed 
and Intelligence Test Scores 

The study of the role of mental¬ 
processing speed in intelligence is made 
difficult by some logistical restrictions. 

When cognitive psychologists conduct 
reaction time experiments they almost 
always provide participants with a sub¬ 
stantial number of practice trials. This 
is done to stabilize performance prior to 
measuring such parameters as semantic 
access or short-term memory scanning 
rates. The reason for doing so is that the 
cognitive psychologist wants to be sure 
that the procedure is measuring a stable 
process in each participant. 

Psychologists studying individual dif¬ 
ferences need to have a large number of 
participants in order to obtain adequate 
statistical power for the analysis of cor¬ 
relations. This makes taking great care to 
obtain reliable measurements from each 
participant an expensive proposition. As 
a result, many studies of the relation 
between performance on elementary cog¬ 
nitive tasks and psychometric test scores 
are based on studies in which, by the 
standards of cognitive psychology, the 
participants simply did not have enough 
practice. 

If the participants are trying to fig¬ 
ure out how to use the testing appara¬ 
tus at the time that measurements are 
being taken, an investigator could obtain 
a spuriously high correlation between 
the information-processing measure and 
an IQ score, because high-IQ individ¬ 
uals are quicker at finding out how to 
work the apparatus. Alternatively, the 
correlation might be spuriously decreased 
because individual performance on the 
information-processing measure will not 
have stabilized at the time of measure¬ 


ment. All we can say is that this feature 
of many experiments makes interpreta¬ 
tion of the results confusing. 

There is also a question of reliability. 
Psychometric tests of intelligence have 
internal consistency statistics (split-half 
reliabilities or Cronbach's a statistic) in 
the .80-.95 range. Reaction time (RT) 
measures typically have even higher anal¬ 
ogous correlations, often in the .90-.95 
range. Within a test session RT measures 
are highly reliable. 

Things are quite different if we exam¬ 
ine reliability across sessions. The cor¬ 
relation between equivalent measures of 
a conventional psychometric intelligence 
test, taken a few days apart, will typically 
fall in the .8-.g range. Taken together 
with the internal consistency statistics, 
this shows that the trait being measured 
is quite stable over intervals of days or 
weeks. RT measures show much greater 
day-to-day variability. Test-rest correla¬ 
tions can drop to .6 at intervals of only 
one or two days.* Because the within- 
day reliability is much higher, the trait 
being measured must itself be variable 
from day to day. Such variability con¬ 
strains the upper value of the correlation 
between an RT measure and a psycho- 
metrically measured trait. If we correct 
for this effect statistically, the estimated 
correlation between the traits mea¬ 
sured by information-processing mea¬ 
sures and intelligence test scores jumps 
from an observed .3-.4 to an estimated 

■ 6 ~ 7 * 

It would be possible to conduct a study 
in which RT measures were taken over 
several days, and the trait common to 
them was correlated with psychometric 
test scores. Such a study would be fairly 
expensive and, as far as I know, has not 
been done. 

* Bittner et al., 1986. 




THE MECHANICS OF INTELLIGENCE 


*55 


the longer tasks would provide a better mea¬ 
sure of individual differences in processing 
speed. 

However ; different tasks do have some 
specificity. For instance, lexical decision 
times correlate most highly with verbal 
intelligence measures, while inspection time 
measures correlate most highly with nonver¬ 
bal tests. Nevertheless, the general trend is 
clear. General cognitive speediness is a reli¬ 
able trait, and is certainly a part of what we 
mean by “intelligence.” 

Some critics of research on mental pro¬ 
cessing speed have attacked the idea that 
“it is good to be fast” on the grounds there 
are situations in which this is not true. 19 In 
fact, some societies distrust rapid respon¬ 
ders, believing that the more intelligent indi¬ 
vidual is the one who stops to weigh alter¬ 
natives before speaking. 

This objection misses the point of the 
research. Studies of individual differences in 
processing speed are concerned with how 
rapidly a person can grasp a situation, given 
a fixed amount of information. Deciding to 
make an overt response is a separate action. 
Being a bit flippant, there is a difference 
between noticing the logical flaw in your 
father-in-law’s political views and deciding 
to point out that flaw to him. These two 
separate acts of cognition both benefit from 
rapid thinking. Indeed, rapid thinking will 
probably help you suppress inappropriate 
or impolitic responses in many social set¬ 
tings. The assertion that there is an associa¬ 
tion between speed of information process¬ 
ing, measured by simple cognitive tasks, and 
the ability to reason in complex settings is 
not equivalent to saying that the intelligent 
individual necessarily has the fastest mouth 
in town. 


6.3. Working Memory and General 
Intelligence 

One of my colleagues once observed that 
since time is continuous the present just 

19 See, e.g., Sternberg, 1996. 


became the past. Therefore, perception is 
impossible; all psychology reduces to mem¬ 
ory and imagination. 20 Leaving aside imag¬ 
ination, a great deal of our thinking does 
depend on memory. 

Psychologists distinguish between two 
types of memory: immediate memory, 
which spans at most a few seconds, and 
long-term memory, in principle spanning 
the individual’s lifetime, although retrieval 
becomes difficult as time passes. The Black¬ 
board Model uses this distinction strongly. 
The blackboard is itself an immediate mem¬ 
ory device, and is thought of as being a 
severely limited resource, while long-term 
memory is conceived of as being essentially 
limitless. We now want to take a closer look 
at the structure of the blackboard. 

Battery-type intelligence tests, such as the 
WAIS, often contain separate subtests for 
short-term memory and long-term memory. 
Short-term memory is commonly evaluated 
by a digit span test. In a digit span test the 
examiner presents a randomly chosen series 
of three to eight digits, for example, “3, 9, 
7, 4.” The examinee’s task is to echo the 
digits immediately after they have been pre¬ 
sented. In the somewhat more challenging 
backward digit span task, the examinee must 
echo the digits in reverse order, “4, 7, 9, 3” 
in the example. 

The forward digit span task is generally 
regarded as a test of a person’s short-term 
storage capacity. Digit span tests typically 
have a loading of about .5 on the general 
factor of an intelligence test battery. 21 Back¬ 
ward digit span introduces a processing as 
well as a memory component; instead of 
“reading out” memory the examinee has to 
reverse the order in which information has 
been received. The backward digit span is 
more highly correlated with total test scores 
than the forward span. This suggests that we 
ought not regard the blackboard-immediate 
memory system as just a storage system. Nor 
shall we. 

20 Poltrock, 1977. 

21 See Gignac, 2006, and O’Grady, 1983, for results on 

the WAIS, spanning more than twenty years. 
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When Alan Baddeley introduced the idea 
of working memory, 22 he pointed out that 
when people solve problems they have to 
hold key pieces of information readily at 
hand, even though they are not at the imme¬ 
diate focus of consciousness. In addition, 
information has to be transformed, men¬ 
tal representations have to be built, and 
attention has to be focused on informa¬ 
tion relevant to the task at hand, while 
irrelevant information must be suppressed. 
Baddeley’s view of working memory, which 
is incorporated into the Blackboard Model, 
is that it is a mental workspace containing (a) 
modality-specific, passive memories [slave 
memories , in Baddeley’s terminology) for 
small amounts of auditory and spatial-visual 
information and (b) a somewhat underspec¬ 
ified “executive” that supervises the flow in 
and out of the storage systems. Variants of 
this model that include an additional slave 
memory for abstract semantic information 
have since been proposed. 25 

Baddeley’s “central executive” was not 
well specified. At the least, there has to be 
provision for a process that focuses attention 
on some information in working memory, to 
make it available for processing, while at the 
same time preventing [total) loss of infor¬ 
mation that is not currently at the focus of 
attention, but that has been accessed in the 
recent past and may be needed in the imme¬ 
diate future. 

This idea is encompassed by the term 
“situation awareness,” a phrase that has 
become popular in engineering psychology. 
The term refers to the need to be aware of 
what is going on about you, even if some 
aspects of the situation are not of imme¬ 
diate concern. For instance, an automobile 
driver has to keep aware of the positions 
of nearby cars, including those behind the 
driver's vehicle and hence not in immediate 
view. 

Similar requirements occur in many sit¬ 
uations, some quite far removed from driv¬ 
ing. When attorneys question witnesses dur¬ 
ing criminal trials they have to be aware 

22 Baddeley, 1986; Baddeley & Hitch, 1974. 

23 Hunt, 2002. 


of jurors’ reactions as they are interrogat¬ 
ing the witness. When people attack prob¬ 
lems in logic they have to be aware of differ¬ 
ent interpretations of the premises. When 
chess players consider a move they must 
evaluate different counters that the oppo¬ 
nent might make. From an information¬ 
processing view, controlled manipulation 
of information in working memory is an 
extremely important part of cognition. 
Therefore, we would expect individual dif¬ 
ferences in the effectiveness of the work¬ 
ing memory system to be related to general 
intelligence. And they are. 

6.3.1. Measuring the Capacity of 
Working Memory 

The span task is a popular technique for 
evaluating working memory capacity. In a 
span task a person is asked to perform a 
series of simple actions [e.g., reading a sen¬ 
tence) while holding more and more infor¬ 
mation in memory as each sentence is read. 
The simple task will be called the primary 
task; the act of holding information in mem¬ 
ory is the secondary task. The idea is that 
the processing requirements of the primary 
task interfere with holding the information 
needed for the secondary task. Therefore the 
amount of information that can be held in 
memory in the face of the periodic inter¬ 
ruptions by the primary task provides a 
measure of the storage capacity of working 
memory. 

Two typical span tasks are shown in 
panel 6.6. When the primary task is reading, 
the span measure has a .5 correlation with 
measures of verbal comprehension, includ¬ 
ing the paragraph comprehension tasks 
that are often found on tests of verbal 
intelligence. 24 

When different types of span tasks are 
included in a comprehensive battery of rea¬ 
soning and information-processing tasks it 
turns out that (a) all the span tasks reflect 
a common underlying factor, referred to 
as working memory capacity (WMC), and 
[b) WMC has a correlation of .64 with a 

24 Daneman & Merikle, 1996. 
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Panel 6.6. Memory Span Tasks 

The first, and still the most widely used, 
memory span task is the reading span task 
developed by two Camegie-Mellon Uni¬ 
versity researchers, Meredyth Daneman 
and Patricia Carpenter.* The examinee 
hears or is shown from one to five unre¬ 
lated sentences, one at a time, and then 
is asked to repeat the last word of each 
sentence. An example is 

He walked down Michigan Avenue in 
the face of the bitter wind. 

The restaurant was known for its cre¬ 
ative preparation offish dishes. 

All the cabinet wished the foreign secre¬ 
tary success on her vital trip. 

RECALL 

The examinee is then supposed to 
recall wind, dishes, trip. 

Randolph Engle, a Georgia Institute of 
Technology professor, and his colleagues 
have generalized this finding. 7 They point 
out that, abstractly, the span task requires 
processing some information (the sen¬ 
tences) while holding other information 
(the words) in abeyance. Therefore, a 


(somewhat) nonverbal span task, process¬ 
ing span, can be created. In the following 
example the task is to read and verify the 
arithmetic equation, and then store the 
word that follows the answer. 

The examinee is shown 

IS (8/4) -1=1? 

The examinee responds “yes.” The 
word bear is then presented. This is fol¬ 
lowed by the following sequences: 

IS (6*2) — 2 = 10? (Response: YES) 

Beans 

IS (10*2)-6 =s 12? (Response: NO) 

Dad 

RECALL 

The examinee replies “bear, beans, 

dad A 

Span tasks similar in structure can be 
constructed for a variety of operations in 
other domains, such as counting, keeping 
track of movements, and imagined rota¬ 
tion of visual objects. 

* Daneman & Carpenter, 1980. 
t Engle etal., 1999; Kane & Engle, 2002; Kane etal., 

2001, 2004. 


general reasoning factor extracted from con¬ 
ventional intelligence tests. 25 

Span tasks are a special case of dual tasks, 
tasks in which people are asked to attend to 
at least two streams of information at once. 
A person who listens to the radio while driv¬ 
ing a car is performing a dual task. Many lab¬ 
oratory tasks are formalizations of this every¬ 
day paradigm. In a dual task attention has to 
be switched from one stream of information 
to another, and then back again, with mini¬ 
mum loss of information during the switch. 
Dual tasks call on both the storage and atten- 
tional control aspects of working memory. 

In a widely referenced series of studies by 
Patrick Kyllonen and his colleagues at the 
US Air Force’s Armstrong Laboratory, Air 

25 Kane et al., 2001. 


Force recruits performed a number of work¬ 
ing memory tasks, including span tasks, and 
also took highly g-loaded tests of reasoning. 
The recruits’ scores on the ASVAB were 
also available. A common factor extracted 
from the working memory tasks predicted 
virtually all the variance extracted from the 
reasoning tests and from the general factor 
on the ASVAB test. 26 Similar findings have 
been reported by other investigators. This 
work is important for two reasons. First, 
the populations studied included US Air 
Force enlistees, thus broadening the studies 
beyond the usual range of college students. 
(While Air Force enlistees generally do not 
have the test scores of college students, 

26 Kyllonen & Christal, 1990; Kyllonen & Stephens, 

1 99 °. 
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they are of somewhat above-average ability 
compared to the general population.) Sec¬ 
ond, the number of different working mem¬ 
ory tasks used ensured that the result did 
not depend upon the unique information¬ 
processing requirements of particular tasks. 

The latter finding poses something of an 
intellectual challenge. As the Spanish psy¬ 
chologist Roberto Colom put it ; 

Working memory comprises the functions 
of focusing attention, conscious rehearsal, 
and transformation and mental manipu¬ 
lation of information. 

Colom et al, 2004, p. 277 

Kyllonen’s work and the studies that fol¬ 
lowed it show that working memory a 
complex information-processing concept, is 
related to general intelligence. But what part 
of working memory is crucial? Ian Deary, a 
professor at the University of Edinburgh, 27 
has pointed out that we do not advance 
understanding by showing that one mysteri¬ 
ous concept is linked to another. 

There are two ways to reply to Deary’s 
objection. One is to show that one, two, or 
all of the information-processing functions 
that comprise working memory make sepa¬ 
rate contributions to the relationship with g. 
That is, we reduce the intelligence-working 
memory link to links between intelligence 
and the component parts of working mem¬ 
ory. The other is to accept the linkage at 
the intelligence-working memory level, and 
challenge the need for further reduction. 

In order to discriminate between these 
possibilities we need to have separate mea¬ 
sures of Colom and colleagues’ four aspects 
of working memory. We require: 

(a) Memory span tests, to evaluate the abil¬ 
ity to store information while processing 
another concurrent task. 

(b) Tests that evaluate short-term storage 
without processing - for instance, recall 
of a short string of digits without any 
intervening or concurrent processing, 

(c) Tests that evaluate the ability to con¬ 
trol attention without any memory 

27 Deary, 2000. 


component. An example of such a test 
is shown in panel 6.7. 

(d) Tests that evaluate processing speed, 
such as a choice reaction time task. This 
is considered necessary because speed 
of stimulus identification, decision mak¬ 
ing, and rehearsal of information are 
all involved in the working memory 
tasks. 

(e) One or more tests shown to be markers 
of reasoning, fluid intelligence, or gen¬ 
eral intelligence. 

Structural equation modeling can then be 
used to identify latent traits underlying each 
of the five types of tests, and to determine 
the relationships between the traits. 

Several such studies have been done, with 
somewhat confusing results. Within each 
of the studies a coherent picture emerges, 
often identifying a key component deter¬ 
mining the relation between working mem¬ 
ory and intelligence. Across studies there 
is very little agreement over what the 
component is. 

The array of studies has an international 
flavor. Studies of American university stu¬ 
dents produce results indicating that the 
key component is the ability to control 
attention. 28 Studies of Spanish high school 
and university students 29 indicate that the 
storage process is the key component. Stud¬ 
ies with German university students pro¬ 
duce results pointing toward “supervision,” 
which is described as the ability to keep 
track of the current state of variables that 
change over time - for example, the location 
of the car in the driving example given ear¬ 
lier. (This process is sometimes referred to 
as updating .) ? ° 

It seems unlikely that working memory 
would work differently in Spain, Germany, 
and the United States. And just in case you 
think it might, a study that explicitly com¬ 
pared the architecture of information pro¬ 
cessing in two different countries showed 

28 Engle et al., 1999; Heitz, Unsworth, & Engle, 2005; 

Kane et al., 2001, 2004. 

29 Colom et al., 2008. 

30 Oberauer et al., 2003. 
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Panel 6.7. An Attention-switching 
Task 

Several different tasks have been devised 
to measure the ability to switch atten¬ 
tion from one concurrent task to 
another. They all follow this general 
outline. 

1. The observer is presented with two 
simultaneously presented streams of 
stimuli, and is told to search for tar¬ 
gets on one. Stimuli are presented 
rapidly. 

2. Aperiodically the observer receives a 
signal indicating that he or she should 
either continue to search for targets 
on the stream being monitored or 
shift to the other stream. 

3. The dependent variable is the num¬ 
ber of targets missed immediately 
after the signal is sounded. This 
is taken as a measure of the time 
required to fix attention on the new 
target stream. 

The following example illustrates the 
procedure. The observer is told that he 
or she will see a sequence of pairs of 
letters presented on a computer screen. 
The letters will be presented reason¬ 
ably far apart, but both letters will be 
within the visual field. Only one of these 
sequences is to be monitored at any one 
time. 

The task is to press a button whenever 
a vowel appears in the position (right or 
left) currently being monitored. 


At the start, monitor the right-hand 
sequence. 

Subsequently, if a high tone is 
sounded, switch from the currently mon¬ 
itored sequence to the other (e.g., from 
the right to the left). If a low tone is 
sounded, continue to monitor the se¬ 
quence being monitored before the tone 
was sounded. 

Here is an example. Each line repre¬ 
sents a time period. 


Left-hand Right-hand 

Sequence Sequence 



R 

L 


K 

X 


O 

R 


Z 

E (target) 


T 

K 



(High tone sounds) 

(target) 

I 

J 


N 

S 



(Low tone sounds) 


Q 

Y 

(target) 

u 

C 


Doing well on this task hardly qualifies 
as intelligence, in the conventional mean¬ 
ing of the word. However, the ability to 
switch back and forth from one stream 
of information to another can be vital 
in certain jobs. Air traffic controller is a 
good example. Empirically, the ability to 
perform well on attention-switching tasks 
is statistically associated with scores on 
measures of fluid intelligence.* 

* Duncan et a!., 1996. 


that the overall relations between structures 
were the same in each. The countries com¬ 
pared were Greece and China! 31 

In spite of the confusing results, I 
believe the findings have a rather simple 

31 Demetriou et al., 2005. Chinese children and ado¬ 
lescents were better than Greeks at tasks involving 
visual-spatial reasoning, but there was no difference 
in variables specifying the relation between working 
memory and reasoning. 


explanation. There is little doubt that work¬ 
ing memory is an important component of 
general intelligence, although the two are 
not identical, either in the sense of being 
perfectly correlated with each other statis¬ 
tically or in the sense of being conceptu¬ 
ally identical. 32 It is not surprising to find 

32 In addition to the studies cited here, my conclusion 
is based on an excellent review of the literature by 
Ackerman, Beier, & Boyle (2005}. 
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that the constraining features of working 
memory may differ in different popula¬ 
tions. For instance, studies in the ubiqui¬ 
tous university student population show rel¬ 
atively small relations between processing 
speed and intelligence test performance. If 
we extend the studies over the full adult 
range processing speed becomes an impor¬ 
tant variable. Finally, confusion may result 
because studies that try to isolate the most 
important aspect of the working memory- 
intelligence relationship may be trying to 
carry the analysis to a finer level of detail 
than is appropriate. I now elaborate on this 
point. 

6.3.2. Brunsivickian Symmetry: 

Working Memory and Intelligence 
as System Effects 

In a talk given at the University of Wash¬ 
ington, the German psychologist Werner 
Wittmann made the point that when we 
compare two sets of behaviors it is important 
to maintain the same level of complexity for 
each behavior. Wittman referred to the bal¬ 
ance as Brunsivickian Symmetry , by analogy 
to some of the ideas of Egon Brunswick con¬ 
cerning decision making. Wittmann’s argu¬ 
ment can profitably be applied to studies of 
the relation between working memory and 
general intelligence. 

The terms general intelligence and work¬ 
ing memory both refer to systems of psy¬ 
chological functions, rather than individual 
functions. Consider the case of what many 
consider to be the best single measure of 
general intelligence, the progressive matrix 
item. A moderately difficult item of this sort 
is shown in Figure 6.3. What does it take to 
solve this sort of problem? 

You have to be able to pick apart the 
individual elements of the figures, recogniz¬ 
ing that each figure has certain attributes, 
rather than reacting to the figure as a whole. 
You have to be able to recognize that 
each attribute of a component of the fig¬ 
ure appears once in each row and that each 
attribute of another component appears 
once in each column. Because visual atten¬ 
tion is limited, you must be able to hold 


in memory information about recognizing 
an item in a row as you look down the 
rows and across the columns. Then you have 
to fill in the missing pattern. Finally, you 
have to hold the correct pattern in short¬ 
term memory as you search the alternative 
answers provided, until you find one that 
matches the pattern. There is no one neces¬ 
sary working memory function. All of them 
are required. 

Working memory and intelligence are 
both concepts that refer to the ability to 
coordinate a system of elementary functions 
into a coherent whole. There are relations 
at the system level, but trying to establish 
relations between the elementary functions, 
below the level of overall system function¬ 
ing, is not likely to be useful. The quest for a 
single information-processing function that 
explains intelligence is no more likely to suc¬ 
ceed than the quest for the Holy Grail. 35 

6.4. Verbal Comprehension 

Verbal comprehension is the process of under¬ 
standing written or spoken communication. 
In adult readers, the two processes are highly 
correlated. 34 Comprehension is somewhat 
different from production, although provid¬ 
ing that one does not demand elegance in 
writing or speaking, the abilities to under¬ 
stand and to generate linguistic messages 
are closely related. Tests of verbal compre¬ 
hension are contained in almost all battery- 
type intelligence tests. For instance, people 
who take the SAT have to show that they 
can extract meaning from passages that are 
roughly the length of a newspaper editorial, 
and that deal with serious topics. 

Over the general range of ability, under¬ 
standing language and general intelligence 
are closely intertwined, although not per¬ 
fectly correlated. In extreme cases the pro¬ 
cesses do become distinct; poets are not 
necessarily good at high level mathemati¬ 
cal reasoning, nor are mathematicians nec¬ 
essarily literary virtuosos. However, it is 

33 Hunt, 1987. 

34 Palmer et al., 1985. 
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Figure 6.3. A progressive matrix item of intermediate difficulty. 
The task is to decide which of the figures below the line is needed 
to replace the blank item in the 3x3 matrix above the line. The 
problem was constructed as a demonstration, using Carpenter, Just, 
and ShelFs (1990) rules for constructing progressive matrix items. 


important to keep this distinction in per¬ 
spective. People who have an elegant com¬ 
mand of the language do not display 
incompetence in logical and mathematical 
reasoning, nor are logicians and mathemati¬ 
cians likely to be inarticulate. 

At the other end of the intelligence scale, 
low general intelligence is almost always 
associated with a pervasive low level of ver¬ 
bal comprehension. On occasion, though, 
we find a disassociation between general rea¬ 
soning and some aspects of verbal skill. Here 
is an interesting example. 

Williams syndrome is a genetically deter¬ 
mined type of mild to moderate men¬ 
tal retardation. Patients with Williams syn¬ 
drome display language abilities that are 
considerably higher than their general rea¬ 
soning abilities. At one time it was thought 
that this was evidence for a disassociation 
between linguistic skills and general cogni¬ 
tion, but further research has shown that 
that is not quite accurate. Williams syn¬ 
drome patients show surprising verbal skill, 
compared to other people suffering from 
the same level of general mental deficiency. 
However, their use of language lags behind 
that of age-matched children with normal 
mental capacities. One of the most interest¬ 
ing deficiencies is their tendency to react to 


the literal meaning of statements, thus fail¬ 
ing to grasp the meaning of metaphorical 
statements or irony. 35 A Williams syndrome 
patient might profoundly misinterpret state¬ 
ments like He’s full of baloney or I suppose 
he’ll relax by going skydiving. 

To go further we have to consider 
what language comprehension is. Walter 
Kintsch, a University of Colorado profes¬ 
sor, has developed a useful framework for 
understanding the comprehension process. 36 
Kintsch distinguished three stages in com¬ 
prehension. First there are the purely lin¬ 
guistic processes of retrieving word mean¬ 
ing ( lexical retrieval, in technical terms) 
and applying syntactic rules in order to 
extract meaning from phrases and sentences. 
These tasks comprise low-level comprehen¬ 
sion. Low-level comprehension is followed 
by high-level comprehension, in which sen¬ 
tence and phrase meanings are incorporated 
into a text model, essentially an understand¬ 
ing of what a text says. As the text model is 
built, some information is dropped out and 
other information is highlighted. The text 
model is then incorporated into a situation 
model, which represents the meaning of the 

35 Mervis, 2003. 

36 Kintsch, 1998. 
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text, in the light of other information that 
the comprehender has and sees relevant. 

Kintsch presents text comprehension as 
a highly interactive process. Sentence com¬ 
prehension, building the text model, and 
building the situation model do not take 
place in series; there are many feedback 
loops. Going into the theory in detail would 
be rather far afield from our concern with 
individual differences. Instead I offer exam¬ 
ples in the spirit of Kintsch’s analysis, with¬ 
out going into details. 

The following statement was made by 
Senator John McCain, about Senator Hillary 
Clinton, at a time when McCain was a can¬ 
didate for the 2008 Republican presidential 
nomination and Clinton was a candidate for 
the Democratic nomination. 

In case you missed it, a jew days ago, 
Senator Clinton tried to spend $1 million 
on the Woodstock Concert Museum. Now, 
my friends, I wasn't there. Ym sure it was 
a cultural and pharmaceutical event ... 7 
was tied up at the time} 1 

Senator John McCain, Oct. 29, 
2007, at a Republican debate in 
Orlando, Florida 

The following propositions make up my 
(informal) text base 

Senator Clinton tried to spend a million 
dollars. 

The money was for the Woodstock Concert 
Museum. 

McCain was not at (the Woodstock 
concert). 

McCain believes the concert was a cultural 
and pharmaceutical event. 

McCain was occupied with other matters 
at the time. 

Certain statements in the text are unim¬ 
portant - for example, the qualifier “in 
case you missed it.” They drop out of the 
text base. Strictly speaking, the statement 
“I wasn’t there” is ambiguous; it could refer 

37 McCain’s statement was accurate. Clinton had 
joined with New York’s other senator, Charles 
Schumer, in an attempt to secure funds for the 
museum. 


either to the concert or to the time when 
Senator Clinton tried to acquire money for 
the museum. 

The situation model puts this in the con¬ 
text of unstated but widely known infor¬ 
mation about the Woodstock concert and 
about McCain himself. Of course, such a 
model depends upon the comprehender s 
knowledge. Here is what my situation model 
is. Information not specifically stated in the 
text base is shown in parentheses. 

Clinton wants to appropriate one million 
dollars for a museum commemorating the 
Woodstock concert. (Government money, 
not her own.) 

The Woodstock concert was a cultural 
event. (It was the icon of the radical youth 
culture of the late 19 60s and early 1970s.) 

There was a great deal of (illegal) drug use 
at the concert. 

At the time of the concert McCain was not 
present. (He was a prisoner of war in Viet¬ 
nam, where he was badly treated, i.e., he 
was literally tied up.) 

An even wider situation model assumes 
that McCain was contrasting his undeni¬ 
ably patriotic record during the Vietnam 
War to Clinton’s attempt to memorialize 
an event that many in his Republican audi¬ 
ence considered an example of moral decay 
during the “hippy” era. Other readers might 
develop other models. In each case the situa¬ 
tion model will depend heavily upon knowl¬ 
edge that the comprehender brings to bear. 
This depends upon both what the com¬ 
prehender knows and what he or she sees 
as relevant to understanding the current 
statement. 

Our interest is in the role of individual 
differences in information processing during 
both the low-level stage of understanding 
what the text literally says and the high-level 
stage of understanding what it means. 

6 . 4 . 1 . Information Processing and 
Low-Level Linguistic Skills 

Every normal human being acquires a com¬ 
plex set of rules for speaking and listening, 
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up to the level of analysis of sentences and 
phrases, in a stunning display of tacit learn¬ 
ing. But we are not all equally adept lin¬ 
guists. There are subtle differences among 
individuals in their ability to understand 
language. 

Different people know different words. 
A score on a brief vocabulary test is a 
guide to a person's general intelligence. 38 
For instance, the vocabulary test included 
in the WAIS has a loading of .80 on the 
general factor. 39 Having a small vocabulary 
does not necessarily reflect a deficiency in 
information-processing capability, for dif¬ 
ferences in vocabulary may simply reflect 
differences in a person's social environment. 
Word knowledge, in itself, is part of intel¬ 
ligence, for your vocabulary does, in part, 
determine how well you can function in 
society. Just knowing words, however, is 
not an information-processing component 
of intelligence. 

People differ in the speed with which 
they can retrieve well-known words, which 
is an information-processing capability. Peo¬ 
ple who have high scores on tests of verbal 
comprehension recognize common words 
more rapidly than people with low test 
scores. 40 They also recognize well-known 
semantic relationships, answering questions 
like “Is a deer an animal?” more rapidly 
than their low-scoring counterparts. 41 This 
appears to be partly a general processing- 
speed effect and partly a more restricted ver¬ 
bal skill. 

There is one piece of the mechanics of 
word recognition that is not related to verbal 
comprehension test scores. In general, peo¬ 
ple are quicker to recognize a word when 
it is presented shortly after a related word. 
The word NURSE will be recognized more 
quickly if a person is first asked to recognize 
a related word, like DOCTOR, than if an 
unrelated word, like BUTTER, is presented 
first. This phenomenon is called semantic 
priming. While priming is obviously useful in 
verbal comprehension, there are only small 

38 Carroll, 1993. 

39 Gignac, 2006. 

40 Palmer et al. ( 1985. 

41 Goldberg, Schwartz, & Stewart, 1977. 


individual differences in priming, and they 
are not related to individual differences in 
verbal comprehension. 42 

Moving from word recognition to sen¬ 
tence recognition, we find both effects of 
accuracy of lexical retrieval and effects of 
working memory. Marcel Just and Patri¬ 
cia Carpenter 43 measured people’s work¬ 
ing memory, using the reading span task 
described earlier. They then determined the 
time that their participants took to analyze 
complex sentences and how accurately they 
were able to analyze them. Here are two 
illustrations of their work. 

Consider the two sentences 

The evidence examined by the lawyer 
shocked the jury. 

The defendant examined by the lawyer 
shocked the jury. 

In the first sentence examined must intro¬ 
duce a relative clause modifying evidence, 
because evidence, being inanimate, cannot 
examine anything. In the second case exam¬ 
ined might refer to an activity by the defen¬ 
dant or it might introduce a relative clause. 
The ambiguity cannot be resolved until the 
phrase by the lawyer is encountered. At 
an absolute level, people with high work¬ 
ing memory capacity read both sentences 
faster than people with low working mem¬ 
ory capacity. However, people with high 
capacity read the first sentence more rapidly 
than the second, that is, they took advantage 
of the ambiguity resolution afforded by the 
inanimate quality of evidence. People with 
lower memory spans read both sentences at 
about the same speed. 

Now consider the following two sen¬ 
tences: 

The experienced soldiers warned about the 
raid before the midnight attack. 

The experienced soldiers warned about the 
raid conducted the midnight attack. 

The phrase The experienced soldiers 
warned about the raid is ambiguous; it could 

42 Palmer et al., 1985. 

43 Just & Carpenter, 1992; MacDonald, Just, & 
Carpenter, 1992. 
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refer to a warning either given by or given to 
the soldiers. The ambiguity is resolved when 
the word before or conducted is read. Read¬ 
ers with high working memory capacities 
slowed down when they encountered the 
ambiguity-resolving word. People with low 
working memory capacities did not. When 
asked questions about the interpretation of 
the second question, such as Did anyone 
warn the soldiers?, people with high spans 
were more likely to answer the question 
correctly. 

Just and Carpenter interpreted these 
results as showing that the high-span 
(and, by inference, high-working-memory- 
capacity] readers carried forward both 
meanings of the ambiguous phrase until 
the ambiguity was resolved, while the low- 
span readers carried forward only the pre¬ 
ferred alternative, that the soldiers did the 
warning. 

Just and Carpenter also considered the 
effects of an extrinsic task on sentence 
comprehension. Recall that in the mem¬ 
ory span task a person reads a sentence, 
stores the last word, then reads another sen¬ 
tence, and so forth. Memory span is deter¬ 
mined by the number of words that can 
be recalled as more and more sentences are 
read. Just and Carpenter turned this proce¬ 
dure around. First they established people's 
memory spans, using a fixed set of sentences. 
In a separate procedure, they presented 
material to be retained in memory and then 
presented sentences of varying complexity, 
and determined whether or not the readers 
understood them. This gave them a mea¬ 
sure of sentence-processing capability in the 
presence of a concurrent short-term mem¬ 
ory load. Understanding decreased as sen¬ 
tence complexity increased, but people with 
high spans were less bothered by the concur¬ 
rent memory load than were people with 
low spans. 

Just and Carpenter concluded that work¬ 
ing memory should be considered a capac¬ 
ity for processing, rather than a reflec¬ 
tion of either accurate lexical retrieval or 
short-term storage capacity. I would put 
this somewhat more neutrally. Working 
memory is a system of storage, retrieval, 


processing, and attention control functions. 
The important thing is how well these func¬ 
tions work together to create a system for 
managing information. When dealing with 
verbal comprehension we regard working 
memory as a component of verbal ability; 
when we deal with reasoning more gener¬ 
ally, we talk about the effect of working 
memory on g. In either case individual dif¬ 
ferences in the effectiveness of the working 
memory system are central to individual dif¬ 
ferences in cognition. 

6 . 4 . 2 . Higher-Order Comprehension 
Processes 

It is difficult to distinguish higher-order ver¬ 
bal comprehension from intelligence, for 
some of the most complicated items on cog¬ 
nitive tests are verbal comprehension items. 
Doing well on the SAT requires comprehen¬ 
sion of quite complex statements. It is some¬ 
what more profitable to present an analysis 
of how texts are comprehended, continu¬ 
ing with an informal use of Kintsch's model. 
The analysis will indicate the points at which 
information-processing constraints become 
important. 

The following rather convoluted passage 
appeared in the Yale Alumni Magazine . 

Whatever fails to accord with the values 
of political liberalism fits uncomfortably 
within the range of possibilities that the pre¬ 
vailing conception of diversity permits stu¬ 
dents to acknowledge as serious contenders 
in the search for an answer to the first- 
person question of what living is for . 

Kronman, 2007, p. 26.** 

My first reaction when I read this was 

Eh? 

All-purpose word in Canadian 
English 

But, with an effort, this forty-five-word sen¬ 
tence makes sense. You just have to take it 
stage by stage. 

Whatever fails to accord with political lib¬ 
eralism -> nonliberal ideas 

44 Kronman is a former Dean of the Yale Law School. 


THE MECHANICS OF INTELLIGENCE 


165 


Parse phrase, refer to long-term memory 
for semantic references, and store in 
memory. 

fits uncomfortably within the range -> is 
outside of, not permitted by 

Parse phrase, refer to long-term memory 
for semantic references, store result in 
memory. Working memory now holds 
nonliberal ideas not permitted by. 

of possibilities that the prevailing conception 
of diversity permits -> ideas permitted by 
the current concept of diversity 

Parse phrase, refer to long-term memory, 
identify possibilities with ideas. 

Working memory now holds nonliberal 
ideas not permitted by the current con¬ 
cept of diversity. 

Next we encounter students. This changes 
the presumed grammatical structure of the 
sentence. It turns out that “concept of diver¬ 
sity” is the subject of the sentence, not the 
object of the preposition by. We have an 
example of updating, one of the functions 
of working memory. Appropriate rearrange¬ 
ment of the contents of working memory 
produces 

The current concept of diversity does not per¬ 
mit students 

and non-liberal ideas is free floating until 
the next word occurs. 

to acknowledge 

This anchors the term nonliberal ideas. 
Working memory now holds 

The current concept of diversity does not per¬ 
mit students to acknowledge nonliberal 
ideas. 

We next encounter a mouthful (memo¬ 
ryful?) of relative clauses. 

as serious contenders in the search for an 
answer to the first-personal question of 
what living is for 

These require similar parsing and analy¬ 
sis. I will not go through the steps. The 
result is fairly simple. 

Current concepts of diversity do not let stu¬ 
dents consider nonliberal views. 

We have gone from forty-five words in 
the statement to eleven in the text base. We 


can move to the situational model by recall¬ 
ing that Kronman was writing in 2007, and 
that he was addressing Yale alumni. A com¬ 
pleted part of the situation model is 

There has been concern over academic 
biases against conservative thought. 

Professor Kronman is trying to explain the 
current situation to the Yale alumni. 

Kronman says that 

Views in favor of cultural diversity pre¬ 
vail on campus. 

These views do not permit students to 
consider competing ideas. 

Perhaps Kronman could have written 
more simply. My purpose is not to edit 
his writing. It is to show that comprehen¬ 
sion of language, a very important part 
of human intelligence by any definition, 
requires a complex interplay between the 
working memory and long-term memory 
systems. Long-term memory does not just 
act as a provider of information and syntac¬ 
tic rules. It also holds information needed to 
understand what the text means in the con¬ 
text in which it is stated. Neither McCain’s 
nor Kronman’s statement can be understood 
unless the comprehender knows, and sees as 
relevant, a great deal of cultural information 
about American political and social issues in 
the early twenty-first century. 

The ability to bring such information to 
bear is an important part of intelligence. It 
has to be evaluated if one wants to make a 
serious claim that one’s test measures ver¬ 
bal intelligence in any meaningful way. This 
poses something of a problem to the test 
maker. There may be situations in which the 
test maker does not want to evaluate pos¬ 
session of knowledge, because possession of 
knowledge depends upon a particular cul¬ 
tural background. At the same time the test 
maker does want to evaluate the ability to 
use knowledge to build a situation model. 
What to do? 

The answer is to be clever. Here is an 
elegant example of evaluating an elementary 
school student’s ability to build a situation 
model. It was a test item in an Australian 
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examination intended to assess a student’s 
ability to understand irony. 45 

Text: Lovely mosquito, sitting on my arm 
Stay right where you are , I mean you no 
harm. 

Still as a statue, stand right where you're 
at. 

I only mean to give you a pat. 

Question: Does the writer like the 
mosquito ? 

The best of our computer programs for 
automatic language comprehension would 
have a hard time answering this question. I 
asked a reasonably intelligent fourth grader, 
who replied (with an evil grin) 

He wants to kill it. 

6.5. Information-processing 
Mechanisms Underlying Visual-spatial 
Reasoning 

Visual-spatial reasoning is included in psy¬ 
chometric models as the Gv (visual intel¬ 
ligence) second-order factor in the three- 
stratum model, and by the perceptual and 
rotational dimensions in the g-VPR model. 46 
The “visual” term is justified because the 
ability refers to the ability to detect, recog¬ 
nize, and analyze objects in the visual field, 
and in visual imagery, the ability to manip¬ 
ulate objects in the mind’s eye. The two 
are closely related, although not perfectly 
correlated 47 The “spatial” term refers to 
the ability to reason about real or imagined 
movement in space, including one’s own 
position and movement relative to other 
objects (orientation). Orientation ability is 
not measured in the major battery-type 
intelligence tests, but it is evaluated in some 
personnel-selection situations, particularly 
in aviation. 

45 I encountered this while attending a National 
Research Council workshop on assessment of stu¬ 
dent achievement. I reproduce it from memory, and 
may not have the verse absolutely correct. 

46 Carroll, 1993; Johnson & Bouchard, 2005a. 

47 Burton & Fogarty, 2003; Kosslyn, 1980; Poltrock & 
Brown, 1984. 


Table 6.1. Correlations between different 
visual-spatial abilities 


Narrow Factor 

CF 

CS 

SR 

VZ 

cs 

.65 




SR 

.60 

• 3 2 



VZ 

.58 

.65 

•77 


MS 

•52 

■43 

•43 

.46 


Source: Data excerpted from Burton & Fogarty, 
2003, with permission from Elsevier. 


An analogous factor, auditory ability 
(Ga), has been identified in research on the 
three-stratum model. 48 Auditory ability was 
left out of the g-VPR model simply because 
none of the test batteries analyzed by John¬ 
son and Bouchard contained any auditory 
tests. It is difficult to fit tests of audition 
into a paper-and-pencil testing paradigm, 
but the recent developments in delivery of 
computer-controlled music should make it 
easier to explore individual differences in 
auditory ability in the future. 

A study by two Australian researchers, 
Lorell Burton and Gerald Fogarty, provides 
a good idea of the nature of visual-spatial 
ability. 49 Previous research on the three- 
stratum model had identified the following 
specialized factors for visual-spatial abilities. 

Closure of forms (CF): The ability to 
detect a hidden form in a larger display, 
particularly if the detection requires that 
the examinee “break a set” of looking at a 
larger or different figure. As an example, 
count the number of vertical lines in this 
sentence. To do so you have to detect 
features within familiar letters. 

Speed of closure (CS): The speed 
with which you can recognize easily 
detectable forms, such as a triangle or 
circle. This is closely related to the more 
general concept of processing speed. 

Speed of rotation (SR): The speed with 
which one can recognize a simple figure 
(e.g., a letter) rotated out of its usual 
orientation. 

Visualization (Vz): The ability to 
imagine what a visual figure will look like 

48 Carroll, 1993; Horn & Stankov, 1982. 

49 Burton & Fogarty, 2003. 
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Table 6.2. Loadings of visual-spatial and imagery first-order factors on a general 
visual-spatial reasoning factor 

CF CS SR VZ Image Quality Image Speed Image Self-report 
Loading .74 .69 .80 .87 .83 .59 .46 

Source: Data excerpted from Burton & Fogarty, 2003, with permission from Elsevier. 


if its orientation is changed. An example 
is the “paper folding” test, in which the 
examinee is shown a flat piece of paper 
with dotted lines on it. The examinee 
is to indicate what figure can be con¬ 
structed from folding the paper along the 
dotted lines. 

Memory for shapes (MS): The ability 
to hold a shape in short-term memory 
for a brief while. 

In the terms of the g-VPR model 
these abilities would be second-stratum 
factors - more general than specific tasks 
but still highly specific abilities. They would 
be subsumed by the more general P and 
R dimensions at the third stratum of the 
model. 

Tests of the abilities just listed, and a 
number of others, were given to slightly over 
200 undergraduate students [approximately 
evenly split between men and women). 
Table 6.1 shows the resulting correlation 
matrix. The correlations are substantial and 
positive, an indication of a single factor, 
visual-spatial ability. 

6.5.1. Imagery 

Visual-spatial reasoning tasks force exami¬ 
nees to think about things that they can 
see. Studies of imagery require thinking 
about things that the examinees imagine. 
For instance, you could ask a person to imag¬ 
ine a letter E in its normal form, and then 
ask him or her to describe how it would 
look if it were to be rotated ninety degrees 
clockwise. An extensive body of research has 
shown that the sorts of operations that can 
be performed on images are at least loose 
analogs to the sorts of operations, like rota¬ 
tion, that can be applied to a percept. 50 If 

50 Kosslyn, 1980. 


operations on perceived and imagined fig¬ 
ures use the same mental operations, indi¬ 
vidual differences in visual-spatial ability 
should be related to individual differences 
in imagery. 

Burton and Fogarty included a number 
of objectively measured imaging tasks in 
their test battery. An example of such a 
task is given in panel 6.8. Performance on 
images reduced to two factors: image qual¬ 
ity, the accuracy of information contained 
in a person’s image, and imaging speed, 
how fast a person can construct an image. 
Table 6.2 shows the loadings of both percep¬ 
tual and imagery factors on a single visual- 
spatial reasoning factor. Clearly imagery and 
visual-spatial reasoning are closely related. 
Note, though, that people’s reports of their 
imagery have a lower relation to the visual- 
spatial factor than the behavioral measures 
of imagery do. This is interesting, because 
many studies of imagery have relied on self- 
report, which may not be terribly accurate. 

6.5.2. Spatial Orientation 

Spatial orientation is the ability to develop 
an internal representation of an exterior 
space, including one’s own position in that 
space. There are very large individual dif¬ 
ferences in people’s ability to do so. Some 
individuals seem to have a keen awareness of 
their spatial orientation, both with respect 
to objects in their immediate environment 
[e.g., what is immediately to your right 
rear?) and with respect to larger spatial lay¬ 
outs (e.g., can you draw the layout of the 
building you are in right now?). Others have 
a great deal of trouble answering these sorts 
of problems. This does not mean that these 
people go around lost! Instead they seem to 
rely on memories for routes between key 
locations. Panel 6.9 describes a case that 
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Panel 6.8. An Imagery Task 

Here is an example of a creative imagery 
task. The instructions are 

Imagine the letter A. (Pause) Now imag¬ 
ine a triangle to the right of the A. (Pause) 
Imagine this triangle so that it is now 
facing upside down. (Pause) I want you to 
place the letter A inside the center of the 
triangle such that all end points or edges 
match up. 

Participants were instructed to write 
down as many of the emergent forms as 
they were able to detect. They were told 


not to draw the image held in their mind 
until they had written down everything 
that they could see in their image. 

A picture of what a participant might 
image is shown immediately below. The 
emergent forms detected might include 
one small triangle, two larger triangles, a 
diamond shape, the letter “w,” and so on. 



Panel 6.9. Lost in the Hospital 

A study was conducted of the extent 
to which nurses understood the layout 
of a hospital in which they had worked 
for over a year. All the nurses regularly 
went about their daily activities with¬ 
out getting lost. The experimenter asked 
the nurses how they would go from one 
familiar location to another if their nor¬ 
mal route was blocked. Some of the 
nurses could develop alternate routes; 
some could not.* 


This is an example of a more general 
finding. When people explore a space, 
some of them will develop a mental rep¬ 
resentation functionally equivalent to a 
mental map of the environment. This is 
sometimes referred to as a “surveyor’s 
representation.” Others do not reach this 
level of understanding of their surround¬ 
ings. Instead they develop memories for 
routes from one location to another. f 

* Moeser, 1988. 

1 Hunt, 2002, Chapter 6, section 5. 


illustrates, dramatically, how strong these 
individual differences can be. 

A number of laboratory tasks have been 
designed to evaluate different aspects of ori¬ 
entation ability. This research has generally 
been conducted outside of the mainstream 
of research on intelligence, even though 
maintaining orientation is certainly part of 
the concept of intelligence. 

The reason for this situation is under¬ 
standable. In order to evaluate someone’s 
orientation ability you have to determine 
how well they can explore an unfamiliar 
environment. This is an expensive, time- 
consuming process that does not fit into 
the classic testing paradigm. About the only 
way that orientation can be evaluated under 
normal testing conditions is to show the 


examinee one or two pictures of an object, 
taken from different perspectives, and ask 
what the object will look like from another 
perspective. This gets at one aspect of ori¬ 
entation, but only one. 

Virtual environment technologies pro¬ 
vide a way of getting around the logisti¬ 
cal problems inherent in evaluating spatial 
orientation. 5 1 In this technology the exam¬ 
inee has to move through computer¬ 
generated artificial worlds. Large individ¬ 
ual differences in orientation are found. 
These are related to, although not identical 
to, both conventional tests of visual-spatial 
ability, particularly spatial rotation and 

51 Waller, Beall, & Loomis, 2004. 
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perspective-taking tasks, and skill at orien¬ 
tation in real-world environments. 52 

Spatial orientation serves as a shining 
example of a cognitive skill that is socially 
important for which there are wide indi¬ 
vidual differences, and, as we shall see in 
the following chapters, that has both bio¬ 
logical and cultural bases, but that has been 
almost ignored in research on intelligence 
due to overreliance on the standard testing 
paradigm. 

6.5.3. The Relation between 
Visual-spatial Reasoning and 
Information Processing 

How do information-processing models 
enhance our understanding of visual-spatial 
reasoning? The answer to this turns out, sur¬ 
prisingly, to be “very little,” not because 
information-processing models are irrele¬ 
vant, but because they are incorporated into 
several of the psychometric tests used to 
identify visual-spatial processing. This can 
be seen by contrasting two workhorse tasks 
used to define verbal intelligence - vocab¬ 
ulary tests and paragraph comprehension 
tests - to two workhorse tasks used to define 
visual-spatial reasoning - closure and rota¬ 
tion tasks. 

The two verbal reasoning tasks are com¬ 
plex in themselves. Recognizing the mean¬ 
ing of a string of letters requires identifica¬ 
tion of the string as a word, retrieving its 
several meanings, and determining the cor¬ 
rect meaning in context. The selection pro¬ 
cess is not trivial. In English, which appears 
to be a particularly ambiguous language, 
the typical word has 2.5 meanings. 55 Para¬ 
graph comprehension requires recognition 
of words, retrieval and selection of word 
meanings, plus analysis of sentences, and 
then the construction of text and situa¬ 
tion models. Neither psychometric task goes 
to the level of detail in examining verbal 
processing that the information processing 
measures do. 

52 Hegarty & Waller, 2005; Waller, Knapp, & Hunt, 

2001. 

53 Hunt & Agnoli, 1991. 



Figure 6.4. A visual closure task. There are five 
topi (a species of antelope) in this picture. Can 
you find them? (Ngorongoro crater, Tanzania, 
August 2007. Photograph by the author.) 

Closure tasks require the detection of 
lines, assignment of boundaries to objects, 
detection of surfaces, and development of 
representations of objects from represen¬ 
tations of surfaces. There is something of 
an analogy to building a text model, or 
even a situation model. The example in 
Figure 6.4 illustrates this; perception is dif¬ 
ficult until you (a) have a clue about what 
you might see and [b) are given an anchor¬ 
ing point. This level of detail is close to 
that used in experimental studies of figure 
detection. 

In a rotation task a person is shown a 
relatively simple picture and asked what it 
would look like if it were to be presented at a 
different orientation. Figure 6.5 presents an 
example. There are strong individual differ¬ 
ences in the ability to deal with tasks such 



Figure 6.5. A rotation problem. If gear A is 
moved in the direction indicated by the arrow, 
which way will gear C move? 
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as this. 54 Rotation tasks actually appeared 
on psychometric tests of visual-spatial ori¬ 
entation before they were studied in exper¬ 
imental laboratories. The only difference 
between the psychometric and information¬ 
processing techniques is that the psycho¬ 
metric tests determine how many rotation 
problems a person can solve in a fixed period 
of time ; while the laboratory procedures 
determine how long a person takes to solve 
individual rotation problems. 

Visual-spatial reasoning factors generally 
have high loadings on the general reasoning 
factor, g. From an information-processing 
view this is not surprising. Visual-spatial 
reasoning tasks often require the develop¬ 
ment and manipulation of internal rep¬ 
resentations of visual-spatial information. 
Baddeley's model of working memory con¬ 
tained a "visual-spatial scratch pad." Subse¬ 
quent research has shown that measures of 
Baddeley's scratch pad and operations on its 
content are apart from, although correlated 
with, measures of verbal working memory. 55 

As was the case in discussing the work¬ 
ing memory-g relation, trying to pick visual- 
spatial working memory apart into sepa¬ 
rate tasks does not reveal any single, crucial 
process that explains visual-spatial reason¬ 
ing. The visual-spatial working memory sys¬ 
tem is just that, a system. Reasoning about 
percepts, images, and space is an emergent 
property produced by the synthesis of stor¬ 
age, attention to, and processing of, percepts 
and images. 

6 .6. Summary Comments on 
Information Processing and 
Intelligence 

Individual differences in information¬ 
processing capacity make a substantial 
contribution to individual differences in 
cognitive skills - in essence, to intelligence. 
However, the two are not identical. Rather, 
information-processing capacity constrains 
a person's intelligence. 

54 Hegarty, Just, & Morrison, 1988. 

55 Ackerman, Beier, & Boyle, 2002; Logie, 1995. 


Thought results from the development 
of internal representations of external prob¬ 
lems. This means that there has to be a stor¬ 
age place (or places) for the representation 
of percepts and images. This includes the 
possibility that there are separate storage 
places for, say, linguistic and visual-spatial 
information. There have to be processes 
for fetching information and transforming 
it, in an orderly fashion. There has to be 
some way of prioritizing the processing of 
crucial information. That is what the con¬ 
trol of attention is all about. Everything has 
to work together. And finally, it is better 
if everything is done quickly. The impor¬ 
tant question to ask about the information¬ 
processing contribution to intelligence is not 
whether this or that isolated process is effi¬ 
cient or deficient. It is whether the total sys¬ 
tem is functioning well enough, and quickly 
enough, to get the job done. 

Information-processing capacities alone 
do not produce intelligent behavior. Intel¬ 
ligent behavior is evidenced by appropriate 
responses to external problems. Choosing 
the right response requires the construction 
and manipulation of an internal representa¬ 
tion of a problem. Take an illustrative case, 
buying a new car. 

In order to decide what car to buy you 
have to think about the different ways that 
you are going to use the vehicle, the capabili¬ 
ties of automobiles offered for sale, the price 
range, the cost of operation, and so on and 
so forth. Working memory is a shorthand for 
the mental workspace and tools you use to 
build and manipulate your representation of 
the situation. The more pieces of informa¬ 
tion you can consider at once, the better 
you can balance relevant information. The 
faster your processing speed, the less likely 
it is that information in the transient work¬ 
ing memory stores will decay, and the more 
likely it is that you will be able to juxtapose 
relevant pieces of information. For instance, 
you must realize that economic considera¬ 
tions include both the purchase price and 
the costs of operation and insurance. 

I could just as easily have taken an exam¬ 
ple from cooking, political decision mak¬ 
ing, computer programming, or any other 


THE MECHANICS OF INTELLIGENCE 


field where people use their ability to think. 
Thinking about anything requires having rel¬ 
evant knowledge. In order to bring that 
knowledge to bear you have to use the 
machinery provided by your information¬ 
processing system. 

Although no one of the components 
of the information-processing system dic¬ 
tates thought, each of the components 
constrains thought. Just how different con¬ 
straints operate depends on both the situa¬ 
tion and the person. We have seen exam¬ 
ples of situational variation throughout this 
book. The Wonderlic Personnel Test [WPT, 
see Chapter 2) evaluates a person’s ability 
to answer many simple questions, quickly. 
Processing speed and the ability to switch 
tasks (control of attention, executive func¬ 
tioning) constrain performance on the WPT. 
Now recall the convoluted sentence on lib¬ 
eralism written by the Dean of Yale’s Law 
School. It took a big working memory to get 
through those forty-five words. 

The nature of information-processing 
constraints varies in different populations. 
Comparing results from studies of intelli¬ 
gence in university populations and in the 
elderly, we find that the variations in pro¬ 
cessing speed are not major determinants of 
performance for sophomores, but they are 
for senior citizens. Sophomores are pretty 
good at thinking quickly. One of the most 
consistent findings in research on aging is 
that processing speed decreases substantially 
over the adult years, possibly beginning in 
the thirties. 56 

56 Salthouse, 1996. 
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We can see a similar effect in studies of 
the role of attention, short-term memory 
storage, and information-processing mea¬ 
sures in general. Studies that focus on intel¬ 
lectually capable young adults are examin¬ 
ing a group in which individual differences 
in constraints due to information-processing 
capacities are small, relative to their role 
in other populations. Attentional control is 
weak in young children and the elderly; pro¬ 
cessing speed decreases markedly with age, 
even during an adult’s working lifetime; and 
information-processing constraints appear 
to be much stronger in people of low intel¬ 
ligence than in those with normal or high 
intelligence, as assessed by test scores. 57 The 
relationships among these findings have not 
been ignored, but they certainly have not 
been studied adequately. In a review of stud¬ 
ies of the relation between working mem¬ 
ory and intelligence, Ackerman and his col¬ 
leagues observed that less than three percent 
of the correlations reported involved partic¬ 
ipants over thirty years old! 58 

Psychologists would do well to consider 
this fact: 

All sophomores are human. Not all 

humans are sophomores. 

Future research on the relation between 
intelligence and information processing 
ought to ask how information-processing 
capacities constrain intelligence in differ¬ 
ent populations, as well as how information 
processing constrains thought in people in 
general. 

57 Detterman & Daniel, 1989. 

58 Ackerman, Beier & Boyle, 2005. 


CHAPTER 7 


Intelligence and the Brain 


The human brain, a 3-pound mass of 
interwoven nerve cells that controls our 
activity, is one of the most magnificent - 
and mysterious - wonders of creation. 

President George H. W. Bush, 
July 17, 1990 (Presidential 
Proclamation 6158 designating the 
1990s as the “Decade of the Brain”) 

He's a nice guy but he played too much 
football with his helmet off. 

President Lyndon Johnson, 
referring to Congressman (later 
President) Gerald Ford. Ford had 
been a collegiate football star. 
Attribution by Schnakenberg 
(2004). 

The two presidents were right. President 
Bush (or his speechwriter) described the 
brain accurately — a very complex bit of 
circuitry. President Johnson’s description of 
Ford may not have been accurate, but he 
was right that bouncing the brain is not a 


good thing. A modem playwright put the 
matter in a slightly different way. 

Merkin's brain has a mind of its own. 

Act I of Below the Belt, a 1997 play 
by Richard Dresser 

Every expression of intelligence is due to 
actions of the brain. What actions a brain 
will take in a given situation depends upon 
both the brain’s structure and its history. 
This chapter will focus on individual dif¬ 
ferences in brain structures and processes 
related to intelligence. The topic is exciting. 
Findings are coming in so fast that it is hard 
to make sense of all of them. My computer 
literature search for papers on BRAIN AND 
INTELLIGENCE retrieved 5,648 citations. 1 
Neither I nor anyone else has read them all. 
There are some who believe that virtually 
all psychology, including the study of intelli¬ 
gence, will soon be reduced to studies of the 
brain. I do not think so. What I do think, 
and what will be stressed throughout this 

1 Using PsycINFO, 30 May, 2009. 
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chapter, is that the relation between intelli¬ 
gence and the brain is very important, but 
has to be kept in perspective. Consider this 
analogy. 

Orchestras vary greatly in their ability to 
play music. The same piece played by your 
local high school orchestra and by the New 
York Philharmonic may be recognizably the 
same piece, but the two orchestras do not 
sound the same. There is a positive correla¬ 
tion between size and musical quality. High 
school orchestras tend to be smaller than 
citywide amateur and professional orches¬ 
tras, which are in turn smaller than the 
major philharmonics. However, there can 
be a good deal of variation in the quality of 
music performed by orchestras of the same 
size. Adding performers to your high school 
orchestra will not make it sound like a phil¬ 
harmonic. 

There is also a correlation between intel¬ 
ligence and brain size, but the same cautions 
apply. 

By analogy to studies of brain injury, 
imagine studying the essence of orchestral 
quality by removing one player at a time. To 
start, remove the conductor. The orchestra 
begins to play in a flat, hesitant fashion, and 
can play only those pieces for which it has a 
lot of practice. But it can play, and besides, 
there are inter-orchestra differences when 
the conductor is there. In addition, there 
is the puzzling problem that the conductor 
clearly influences the music being played, in 
spite of the fact that the conductor does not 
make any noise! 

Now try removing all or part of the 
string section, the brass, the woodwinds, or 
the percussion instruments. Each removal 
would affect performance, but the effects 
would depend upon the work being played. 
Removing the strings would affect almost, 
but not quite, all pieces. Playing “76 Trom¬ 
bones” does not require a violin; percussion 
instruments are not needed to play chamber 
music. And there is still the problem that 
there is great variation in the performance 
of intact orchestras. 

Now try an analogy to modern studies of 
how brain metabolism varies with mental 
activity. The orchestral equivalent would be 


measuring the sound level in different parts 
of the orchestra. You would quickly find 
that in all but the simplest pieces the sound 
comes from all over, and that the pattern of 
sound varies far more with the piece being 
played than with the quality of the orches¬ 
tra. The same thing is true in cognition; all 
but the simplest problem-solving activities 
elicit neural activation across the brain, and 
the pattern of activation varies more with 
the activity than with the individual. 

Instead of looking at activity all over the 
orchestra, suppose that we arrange a "labora¬ 
tory study” that isolates the performance of 
individual players, so that we can rate their 
performance. This is what the information¬ 
processing psychologist does, by designing 
situations that isolate working memory or 
visual perception, instead of having them 
work together, as they do in everyday prob¬ 
lem solving. You would find that you were 
getting somewhere, for there would be a 
substantial correlation between the qual¬ 
ity of individual performers and the quality 
of the orchestra. Musicians in philharmon¬ 
ics are much better than musicians in high 
school orchestras. However, if you were to 
look at a narrower range - say, between 
the members of the New York, Cleveland, 
Chicago, and Seattle Philharmonics - you 
would find that the differences were very 
small. 

Then there is the pesky problem of the 
conductor. You would find it very hard to 
evaluate a conductor without an orchestra. 
He or she would look more like a person 
suffering from a minor epileptic fit than a 
musician, and would not make any sound at 
all. How can you reconcile this with the fact 
that that orchestras play considerably better 
with a conductor than without? 

In desperation you conduct a "metabolic” 
study, by looking at how much orchestras 
are paid. This is an indicator. Musicians 
in major orchestras earn considerably more 
than musicians in minor ones, and amateurs 
are not paid at all. But is this because being 
paid more produces better music, or is it 
perhaps the other way around? 

Besides, the silent conductor is getting 
paid more than anyone. 
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The brain is more complex than an 
orchestra, and cognition is more compli¬ 
cated than musical performance. Cognition 
and music are emergent properties. They 
depend partly upon measurable qualities of 
the parts of the organisms that produce 
them, brain and orchestra, and partly upon 
the interaction between those parts. We 
can make progress in understanding cogni¬ 
tion [and music) by making measurements 
on parts, but these measurements, alone, 
will not provide a full explanation. You 
should keep this in mind as we discuss the 
brain-mind relations that tell us a lot, but 
not the whole story, about intelligence. 2 

The following two sections provide an 
introductory discussion of the structure of 
the brain and of modern technologies for 
examining brain structures and processes. 
Readers familiar with both topics can jump 
immediately to section 7.3, which begins 
the discussion of major findings relating 
brain variables to intelligence. I urge readers 
not familiar with brain structure or the new 
technologies to refrain from jumping too 
quickly. 

7.1. The Structure of the Brain 

The human brain is a swelling that sits at 
the upper end of the spinal cord. Figure 7.1 
presents a “cartoon” version of the brain, as 
viewed from the left. It is divided into four 
major anatomical structures, called lobes - 
the frontal, temporal, parietal, and occipi¬ 
tal lobes. The occipital lobe, at the back of 
the brain, is primarily concerned with visual 
analysis. It also plays a role in visual reason¬ 
ing. The cerebellum, sitting below and to 
the rear of the cerebral cortex, is largely con¬ 
cerned with automatic motor coordination, 
although it does have some function in 
cognition. 

2 I am not arguing for a duality of mind and brain. If 
we knew the nature of every connection between 
the approximately five billion neurons in a person’s 
brain, and if we knew the algorithms the brain uses 
to activate and alter these connections, we would 
know everything there is to know about that per¬ 
son’s cognition. We are so far from having such 
knowledge that, for the foreseeable future, there 
is a place for nonbiological models of intelligence. 



Figure 7.1. A sketch of the human brain, seen 
from the left. Sketch by the author. 


If you view the brain from above you 
would see that a deep fissure divides it into 
left and right hemispheres, connected by 
neural bundles that bridge the fissure. The 
largest of these bundles is the corpus callo¬ 
sum, which provides the main communica¬ 
tion link between the two hemispheres. 

In general, somatosensory and motor 
information is represented contralaterally; 
the right side of the brain controls the left 
side of the body, and vice versa. There are 
some exceptions, notably in the analysis of 
information in the middle of the visual field, 
but these need not concern us. Certain func¬ 
tions that are not directly tied to a side of 
the body may also be differentially localized 
across the hemispheres. Language, which 
primarily depends on structures in the left 
hemisphere, is the best known of the later- 
alized functions. 

There are individual differences in brain 
structures. Some left-handed people have 
their language centers located in the right 
hemisphere, and there are some differences 
between men and women in the localization 
of brain functions. These will be discussed in 
Chapter 11, section 3, which describes male- 
female differences in intelligence. 

Figure 7.2 shows a diagram of sev¬ 
eral subcortical structures that are impor¬ 
tant in cognition. The cingulate gyrus func¬ 
tions as a communication system between 
various areas of the cerebral cortex. It 
also appears to be important in response 
selection. Below the cingulate gyrus is the 
limbic system, which contains the hippocam¬ 
pus, and the amygdala. The hippocampus 
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Cortex 



Figure 7.2. The cingulate gyrus and the limbic system. Drawing 
courtesy of the National Institutes of Health. 


plays an important role both in memory and 
in spatial-visual reasoning. The amygdala, 
along with a midbrain structure called the 
hypothalamus, is involved with emotions. 

Six terms are used to define locations in 
the brain. They are: 

Frontal: The forward part of the brain or 
part of the brain being discussed. 

Posterior: The opposite of frontal. 

Ventral: Toward the bottom of the brain 
or the brain region being discussed. 

Dorsal: Toward the top of the brain or 
brain region. 

Medial: Toward the middle of the brain. 

Lateral: Toward the side of the brain. 

These terms may be combined. For 
instance, the dor so-lateral prefrontal cortex 
(DLPFC) is an area at the extreme front of 
the brain (prefrontal], on the upper (dorso] 
outside (lateral] surface. It is important in 
working memory. 

All information held in the brain (mem¬ 
ories] and all processing upon that informa¬ 
tion (thoughts, percepts, images] is deter¬ 
mined by the physical form of networks of 
nerve cells, or neurons. The term “physical” 
is important; ultimately all thinking comes 
down to the manipulation of neurons. So is 
the term “network”; we store and process 


information by changing the configurations 
of networks of connected neurons, not by 
changing the states of individual neurons. 
Our memories of our grandmother, or of 
yellow Volkswagens, are coded as configu¬ 
rations of neural elements. Within limits, 
two different individuals may code the same 
information in somewhat different neural 
configurations. 

Neurons can be classified, anatomically, 
as either gray matter or white matter. White 
matter refers to neurons whose axons are 
coated with a fatty substance called myelin. 
Gray matter consists of masses of neurons 
that are involved in computations within a 
local area of the brain. The white matter 
provides long-distance connections between 
brain regions. There is a loose analogy here 
to distributed computing, where networks 
of gray matter play the roles of local comput¬ 
ing centers and the white matter provides 
the cabling that connects the local centers 
to each other. 

Introductory textbooks often contain 
maps showing where different functions lie 
in the brain. The most famous examples 
involve speech. Broca's area, in the left 
posterior frontal region, is associated with 
speech expression, and Wernicke's area , in 
the left temporal lobe, is associated with 
retrieval and understanding of semantic and 






176 


HUMAN INTELLIGENCE 


syntactic relationships. A great deal has also 
been written about how the left brain is 
specialized for analytic, sequential reasoning 
while the right brain conducts intuitive, par¬ 
allel reasoning. This is an oversimplification. 

The brain does contain centers that are 
specialized for certain kinds of processing, 
but it does not contain regions dedicated 
to broad cognitive or emotional functions. 
The brain functions as a system. The orches¬ 
tral analogy is apt. In understanding human 
intelligence it is useful to consider what cog¬ 
nitive functions are carried out in different 
regions of the brain, but understanding as 
complex a behavioral phenomenon as intel¬ 
ligence requires understanding both where 
different specialized processes are and how 
they mesh to produce thinking. 

7.2. Technologies for Examining 
the Living Brain 

Until the discovery of X-rays at the turn 
of the twentieth century our knowledge 
of brain structure was based upon post 
mortem examinations. These provide a lim¬ 
ited source of information about brain-mind 
relations, because post mortem examina¬ 
tions are biased toward studies of the elderly 
or those who have died an untimely death. 
In both cases the state of the brain fol¬ 
lowing death may not accurately reflect its 
state when the person was alive and fully 
functioning. Nevertheless, a great deal of 
information about brain-mind relations was 
obtained in the nineteenth and first three 
quarters of the twentieth century by study¬ 
ing alterations in behavior following brain 
damage. 

In one of the pioneering studies in this 
field, the nineteenth-century physician Paul 
Broca (1824-1880) determined that injuries in 
the left posterior frontal lobe are associated 
with deficiencies in speech expression but 
not in speech comprehension. This condi¬ 
tion is referred to as aphasia. A century later, 
in 1957, the world saw a striking example of 
the malady. President Dwight Eisenhower 
developed a minor aphasia, probably due to 
a small aneurism in Broca's area. Eisenhower 


recovered sufficiently to complete his term 
of office (until January 1961), although he 
continued to exhibit signs of mild speech 
impairment. Given the extremely demand¬ 
ing nature of the presidency, Eisenhower’s 
performance provides a striking example of 
the dissociation between speech production 
and comprehension. 3 

Broca's observations began a long and 
fruitful line of research in neuropsychology , 
the study of the relationship between the 
brain and mental functions. Neuropsycho¬ 
logical findings are highly relevant to the 
study of intelligence, but they are limited in 
an important way. Variations in intelligence 
within the normal range may not be pro¬ 
duced by the same mechanisms that make 
it possible to perform a particular cognitive 
function in the first place. Consider the fol¬ 
lowing analogy. Loss of a leg makes a dra¬ 
matic difference in running speed. “Num¬ 
ber of legs” is not a determining factor in 
running speed within the normal range of 
variation. 

In order to go beyond neuropsychology 
scientists needed some method of examin¬ 
ing the brains of healthy individuals. X-ray 
imaging, developed early in the twentieth 
century, was a beginning; but early X-ray 
images could not provide a clear picture of 
the soft tissue in the brain. In the 1970s much 
more powerful imaging techniques became 
available. They fall into two broad classes: 
techniques for measuring brain structures 
and techniques for measuring neural activ¬ 
ity at different sites in the brain. 4 Panel 7.1 
presents a brief, nontechnical description of 
the major technologies in use today. The 
development of new technologies contin¬ 
ues, so my list will probably be outdated 
within a year of the publication of this book! 
The results from studies using these tech¬ 
niques are exciting, but we want to keep 
three recurring problems in mind. 

3 Eisenhower recovered. Other leaders have not been 
so fortunate. In 1526 King Henry VIII of England 
received a blow on the head while jousting. His 
impulsive behavior following the injury, which dis¬ 
rupted both his personal life and English policy, 
suggests forebrain damage. 

4 Haier, 2009. 
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Panel 7.1. Technologies for Looking 
at the Brain 

This panel presents a brief view of 
the major technologies used, as of 2010, 
to relate brain structures and processes 
to intelligence. Each of these technolo¬ 
gies was developed from basic discover¬ 
ies in physics that made it possible to 
detect weak electrical and magnetic sig¬ 
nals emanating from the brain. Making 
sense of the signals required the devel¬ 
opment of complicated computer algo¬ 
rithms and extremely rapid electronic 
computing machinery, whose develop¬ 
ment also required basic advances in 
physics. The imaging technologies rep¬ 
resent a striking example of how basic 
research in one field can have important 
practical implications in other fields. 

Technologies for Examining Brain 
Structures 

Computerized Axial Tomography 
(CAT or CT Scanning) 

CT scanning is derived Irom X-ray med¬ 
ical imaging. Marie Curie received the 
Nobel Prize for her discovery of X-rays in 
1911. This started a string of Nobel awards 
building on Curie’s discovery. The first 
CT scanning devices were announced in 
1972, and the inventors were awarded 
Nobel Prizes in 1979. 

In CT scanning, low-intensity X-rays 
are passed through the body from posi¬ 
tions on a circular ring around the body. 
This contrasts with conventional X-ray 
imaging, in which a single picture is taken 
by passing X-rays through the body onto 
a photographic plate. After the X-rays 
have passed through the body they are 
collected by receivers that are far more 
sensitive than the chemical elements on a 
photographic plate. This permits the use 
of X-rays of low enough intensity that 
their passage is impeded by soft tissue, 


producing a soh-tissue image. The two- 
dimensional images, taken from many 
different angles, are combined by a com¬ 
puter program that determines the struc¬ 
ture of various parts of the brain. The 
method is applicable to all parts of the 
body, not just to the brain. 

Magnetic Resonance Imaging (MRI) 

Magnetic resonance imaging (MRI) 
sprang from work in the 1960s and 1970s, 
based on discoveries about electromag¬ 
netism that date from the 1930s and 1940s, 
including Nobel Prize-winning research 
by Isidor Rabi. The first medical scan¬ 
ners were introduced in the early 1980s. 
The developers of modern MRI, Paul 
Laterbauer of the University of Illinois 
and Peter Mansfield of the University of 
Nottingham, were awarded the Nobel 
Prize in 2003. 

In MRI, the person being scanned is 
placed in a tube that is surrounded by 
a large magnetic field. A radio pulse is 
then directed toward the part of the body 
being scanned. The pulse frequency is 
chosen to resonate with hydrogen atoms. 
This causes hydrogen atoms in the body 
to move out of alignment with the mag¬ 
netic field. They then return to alignment 
and, as they do so, emit a detectable elec¬ 
tromagnetic signal. This information is 
used to reconstruct locations in the brain 
or body. 

Diffusion Tensor Imaging (DTI) 

Diffusion tensor imaging (DTI) is a form 
of magnetic resonance imaging based on 
signals that are sensitive to the diffu¬ 
sion of water molecules along the myelin 
sheaths (white matter) that coat some 
of the neurons in the brain. Signals are 
transmitted along myelinated neurons 
considerably faster than they are along 

(continued) 
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Panel 7.1 (continued) 

unmyelinated neurons (gray matter). 
Columns of myelinated neurons are 
believed to be involved in transmitting 
signals from one region of the brain to 
another, while unmyelinated neurons are 
involved in local computations within a 
region. As a loose analogy, DTI provides 
a way of locating the cabling between the 
computing centers of the brain. 

Technologies Used to Identify 
Brain Activity 

Positron Emission Tomography (PET) 
Scanning 

The basic ideas behind positoron emis¬ 
sion tomography were developed in the 
1950s. The first use in humans waited until 
the 1970s, due in part to the need for high¬ 
speed computing to support the sensing 
technology. 

As radioisotopes decay they will 
emit positrons (anti-electrons). When a 
positron encounters an electron both are 
annihilated, emitting a gamma ray. In 
PET scanning a person is placed in a 
ring of sensitive sensors. A rapidly decay¬ 
ing radioisotope is then injected into 
the bloodstream. Rapidly decaying iso¬ 
topes are required in order to avoid dam¬ 
age to tissue as the gamma rays pass 
through the body. The isotope will be 
taken up by tissue, roughly in propor¬ 
tion to the metabolic activity in neu¬ 
rons at that point. As the isotope decays 
it emits positrons, which are annihilated 
when they encounter electrons, produc¬ 
ing gamma rays. The gamma rays will 
vary in strength according to tissue den¬ 
sity and current metabolic activity at that 
point. Sensors detect the gamma rays as 
they exit the body. Computer programs 
are then used to calculate metabolic 
activity at various locations. 


Functional Magnetic Resonance 
Imaging (fMRI) 

Like MRI, fMRI is based on molecules 
generating an electromagnetic pulse in 
the presence of a magnetic field. The 
biological mechanism is different. When 
neurons are active they take up oxygen 
from the blood. This causes a change in 
the magnetism of the oxygen molecules, 
which can be detected by sensors. 
The signal is called the Blood Oxygen 
Level Dependent (BOLD) response. The 
BOLD response provides an indication of 
neural activity at a location. 

Electroencephalograms (EEG) and 
Event Related Potentials (ERP) 

Neural events generate electrical signals 
that can be detected by electrodes placed 
on the scalp. This was first demonstrated 
in 1929. It is the basis for the modem elec¬ 
troencephalogram (EEG). EEG record¬ 
ing is important in medicine because 
physiological states have characteristic 
EEG signatures. These include character¬ 
istic patterns for epilepsy and for different 
stages of sleep. The event-related poten¬ 
tial (ERP) is an EEG signal in response 
to a specific external event, such as a 
flashing light. The ERP has proven to 
be a very useful tool in cognitive psy¬ 
chology, as different mental events have 
characteristic ERP signals. For example, 
when people hear semantically mean¬ 
ingless sentences, such as THE COOK 
ROASTED THE CEMENT, they dis¬ 
play a characteristic ERP. Syntactically 
anomalous sentences, such as WOMAN 
THE LOVE CATS, have a different ERP 
signature. 

Modern EEG signals are recorded from 
50 to more than 100 sites on the scalp. 
Computer-based analysis is required to 
identify signature waveforms and to inter 
their location in the brain. 
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7.2.1. The Spatial and Temporal 
Resolution Problem 

The MRI, PET, and CAT procedures descri¬ 
bed in panel 7.1 are accurate to within a 
few millimeters. Therefore, they can detect 
structures and processing activity in bun¬ 
dles of neurons, but not individual neu¬ 
rons. Timing presents a problem. Func¬ 
tional MRI (fMRI), the most widely used 
method, identifies neural activity by mea¬ 
suring oxygen uptake, referred to as the 
BOLD response. The BOLD response takes 
place over a few seconds. By contrast, cog¬ 
nitive operations such as word identification 
take place in well less than a second. There¬ 
fore, the BOLD response is useful in locat¬ 
ing a place of brain action, but less useful in 
isolating the time course of the action. 

There is activity all over the living brain, 
all of the time. In order to determine the 
place of action associated with a particu¬ 
lar cognitive activity neuroscientists have to 
isolate the “signal," the brain activity asso¬ 
ciated with the cognitive action of interest, 
from the “noise,” the background activity of 
the brain. In order to do this neuroscientsts 
use an analog of the Donders subtraction 
paradigm described in Chapter 6. The brain 
regions supporting different steps within a 
cognitive activity are located by compar¬ 
ing the BOLD response during an activity 
that employs the function of interest to the 
BOLD response during a control condition, 
which contains all the steps of the first activ¬ 
ity except the one of interest. For exam¬ 
ple, in order to identify the neural struc¬ 
tures associated with lexical identification 
the BOLD response obtained while a per¬ 
son reads words like CAMEL is compared 
to the BOLD response as he or she looks at 
a meaningless letter string, such as LEMAC. 
The BOLD response to LEMAC is sub¬ 
tracted, on a point-by-point basis, from the 
BOLD response to CAMEL. The areas of 
the brain that show more activity in response 
to CAMEL than to LEMAC are assumed to 
be involved in the retrieval of meaning. As 
in the case of the use of Donders’s paradigm 
in reaction time studies, the comparison is 
valid only if the brain regions involved in 


executing the function of interest are unin¬ 
fluenced by the execution of other functions 
that may be active at about the same time. 

EEG/ERP recordings of electrical events 
are accurate to within a few milliseconds. 
The neural response to a stimulus will fol¬ 
low presentation of a light or noise by a few 
milliseconds, simply because there has to be 
time for the signal to get to the brain before 
it can be interpreted. It is difficult to deter¬ 
mine the place in the brain that is the source 
of an EEG/ERP signal unless a large number 
of electrodes are placed on the scalp. Large 
electrode arrays did not become practical 
until the advent of high-speed computers, 
but this is no longer a serious concern. 

7.2.2. The Averaging Problem 

All of these technologies rely on the analy¬ 
sis of signals immersed in noise. Therefore, 
results are often reported in terms of the 
average signal observed, where the averag¬ 
ing is over trials and sometimes over indi¬ 
viduals. A technology that may be sensi¬ 
tive enough to detect trends in averages may 
not be sensitive enough to detect individual 
differences around that trend. Accordingly, 
someone interested in individual differences 
in cognition - that is, intelligence - can 
rely on positive results obtained using the 
new technologies (subject, of course, to the 
usual cautions about the need for replica¬ 
tion), but should be a bit cautious about 
results that report no difference. Are such 
results obtained because there really are no 
differences between individuals or because 
the technologies being used are not sensitive 
enough to detect them? 

Average patterns of brain activity, com¬ 
puted over individuals, may not represent 
the pattern of activity in any one individ¬ 
ual. For instance, male and female patterns 
of brain activity differ in a number of cog¬ 
nitive tasks. Therefore, it often does not 
make sense to report average brain activity, 
summed across men and women. This is a 
glaringly obvious distinction, for we know 
that there are many differences between 
men and women, and so are alert to errors 
based on averaging results across the sexes. 
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What is more worrisome is that there may be 
qualitatively different types of brain activi¬ 
ty within each gender that are masked in 
averaged data. 

7.2.3. Logistical Problems 

All the new technologies except the EEG 
rely on large, expensive, immobile machin¬ 
ery. This is not a problem that is likely 
to be solved by technology, for the prob¬ 
lems are due to the physical nature of the 
sensing apparatus. All the technologies use 
machines that must be shielded from extra¬ 
neous electromagnetic fields. Magnetic res¬ 
onance imaging requires a large, inherently 
heavy magnet. 

Imaging experiments are time-consuming 
and involve some physical discomfort for 
participants. This can range from injections 
of radioactive materials (in PET scanning] to 
holding still in a noisy, uncomfortable cham¬ 
ber (in MRI). Therefore, there are pres¬ 
sures to use a small number of participants. 
For instance, one important study in the 
field reported correlations based on only 
eight participants. Thirty to fifty participants 
would be considered a large study. For sta¬ 
tistical reasons alone, positive results can be 
trusted but negative results are indetermi¬ 
nate, due to the low statistical power of the 
experiment. In addition, using very few par¬ 
ticipants, often drawn from available rather 
than representative groups, raises the pos¬ 
sibility that there are important individual 
differences in brain-behavior relations in the 
population at large that are not represented 
in the study group. 

Because of the expense involved and the 
need to study relatively few participants, 
many studies are based on comparisons of 
extreme groups - often a contrast between 
“normals” (usually healthy, relatively young 
adults] and clinical populations. While such 
studies can be informative, they do exag¬ 
gerate the size of findings. They can also 
miss important phenomena that appear only 
when the full range of intelligence is studied. 

The new imaging and EEG technologies 
have provided types of data that the psycho¬ 
metricians of the twentieth century could 


only speculate about. Findings based on the 
new technologies have made, and will con¬ 
tinue to make, great strides in our under¬ 
standing of the biological basis of intelli¬ 
gence. Like any other technologies, these 
new ways of looking at the brain do have 
limits, and the limits should be kept in mind. 

7.3. Brain Structures and Cognition 

It would be nice if thinking were located 
in neat compartments in the brain. As the 
orchestra analogy suggests, this is a vain 
hope. Consider an important part of cogni¬ 
tion, paragraph comprehension, that is eval¬ 
uated in almost all battery-type intelligence 
tests. 

From the viewpoint of a psychometri¬ 
cian, a test of paragraph comprehension pro¬ 
duces a score that is an indicator of verbal 
intelligence. From the viewpoint of a cog¬ 
nitive psychologist or a neuroscientist, para¬ 
graph comprehension is staggeringly com¬ 
plex. It involves aspects of (at least] visual 
form identification (occipital and temporal 
lobes), control of eye movements (parietal 
lobes], retrieval of lexical information (left 
temporal lobe], retrieval of syntactic rules 
(left temporal lobe), execution of syntacti¬ 
cal rules for sentence comprehension (left 
frontal lobe), and maintenance of attention 
on the topic (left frontal lobe, anterior cin¬ 
gulate cortex). It takes all of this, and more, 
to read The Cat in the Hat to a four-year-old 
child. 

Nevertheless, we can make some gener¬ 
alizations about the functions of different 
brain areas: 

Frontal cortex: The frontal lobe is involved 
in two major aspects of thought: the pro¬ 
vision of working memory and the control 
of attention. Both working memory func¬ 
tions and the control of attention involve 
coordination between the frontal and pari¬ 
etal lobes, and the anterior cingulate gyrus 
(Figures 7.1 and 7.2). The frontal lobe is also 
important in inhibition of action, includ¬ 
ing inhibition of emotion-guided responses. 
Inhibiting such responses can be impor¬ 
tant, both in constructing plans and in 
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Panel 7.2. The Case of Phineas Gage 

Phineas Gage was a nineteenth-century 
railway construction foreman. He was 
known as a good supervisor and a shrewd 
businessman. In 1848 an accidental explo¬ 
sion drove a steel rod through the for¬ 
ward part of Gage's skull. After he recov¬ 
ered he was irreverent, capricious, unable 
to concentrate, and often profane. Some 
reports remark on his tendency to make 
sexual comments, something that was far 
more proscribed in Victorian times than 
it is today. Gage lost his job as a fore¬ 
man and became a stage coach driver. 
This job probably minimized social inter¬ 
actions, compared to his earlier job, but 
still required considerable skill. Ten years 
after the accident he developed epilepsy. 
He died in i860. During his lifetime there 
was interest in his case. As late as the 
lqqos, Gage's skull was on display in 
Harvard’s Library of Medicine. 

Over a hundred years after Gage’s 
death modern neuroscientists examined 


the skull and concluded that the iron bar 
had heavily damaged the frontal lobe.* 

Here is a personal anecdote. For some 
years I supervised an undergraduate lab¬ 
oratory course in cognitive psychology. 
One day when I entered the laboratory 
I found that a graduate student leading 
one of the sections, a small woman, had 
been trapped in the corner of the room 
by a large man who was berating her 
for giving him low grades. His behavior 
was threatening and completely inappro¬ 
priate. 1 intervened, and convinced him 
to leave. (Because he also exhibited very 
poor motor coordination I was sure that 
I was in no physical danger.) I noticed 
that he had scars on his forehead. Upon 
later inquiry I learned that he had suf¬ 
fered frontal lobe damage. 

These two examples illustrate the 
impulsivity and inability to concentrate 
that is typical of people with damaged 
frontal lobes. 

* Damasio et al., 1994. 


maintaining social order. People with dam¬ 
age to the frontal lobes are notoriously 
poor in both functions. Panel 7.2 presents 
a famous illustrative case and a relevant per¬ 
sonal anecdote. 

Temporal lobe: The temporal lobes are 
heavily involved in hearing. The left tem¬ 
poral lobe and the left posterior frontal lobe 
are important for the production and com¬ 
prehension of speech. There is also some 
involvement of speech in the right tem¬ 
poral lobe. In about five percent of the 
population the speech areas are found on 
the right rather than the left. Such peo¬ 
ple are almost always left-hand dominant. 
(The converse is not true. Many left-handers 
have their speech center in the left hemi¬ 
sphere.) The posterior temporal lobe is also 
involved in the identification of visual stim¬ 
uli, as explained in the following discussion 
of the occipital lobe. 


Parietal lobe: The parietal lobes were once 
thought of as being responsible for analysis 
of tactile signals and for coordination of 
sensory-motor movements. These functions 
are carried out contralaterally; the left side of 
the brain receives sensations from and con¬ 
trols actions on the right side of the body, 
and vice versa. We now know that the pari¬ 
etal lobe is also involved in the allocation 
of attention to sensory input streams. In 
addition, the parietal lobe plays an impor¬ 
tant role in locating objects and in sens¬ 
ing motion. It acts in conjunction with the 
frontal lobe to support the working memory 
system. 

Occipital lobe: The occipital lobe is spe¬ 
cialized for visual analysis. This is where 
most ‘low-level” visual analysis takes place, 
including parsing the visual signal into con¬ 
nected surfaces and objects. During percep¬ 
tion visual information from the occipital 
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Panel 7.3. The Case of H. M. 

H. M. was a young Canadian architect’s 
draughtsman who, in 1953, had his hip¬ 
pocampus, amygdala, and other parts of 
the limbic system removed in order to 
treat severe epileptic seizures. H. M. was 
the last person on whom this operation 
was ever performed. The surgery did con¬ 
trol his epilepsy, but it destroyed his abil¬ 
ity to form new declarative memories. As 
a result he had to live in custodial care 
until he died, in 2008, when he was in his 
eighties. At that time his identity, Henry 
Moulson, was revealed. 

H. M. was studied extensively. One of 
the most interesting aspects of his behav¬ 
ior was the highly selective nature of his 
loss. After his operation his WAIS IQ 
score was in the above-average range. 
This is what would be expected, given 
his employment prior to surgery. Declar¬ 
ative memory - that is, the sort of mem¬ 
ory that can be retrieved by an explicit 
cue to recall - was gone. For example, 
he could not recognize researchers who 
had worked with him for over ten years. 


However, he could learn new motor 
skills, although he did not remember hav¬ 
ing learned them. He also showed some 
semantic memory for events that had 
occurred after his injury. Such memory 
was reduced compared to the memories 
of a noninjured person. 

Although surgeons no longer perform 
bilateral hippocampal removals, a few 
cases of hippocampal loss due to injury 
or accident have been reported. They all 
resemble H. M.* 

A good deal has been made of H. M.'s 
above-normal IQ scores, which have 
been used to argue that intelligence is 
unaffected by the loss of declarative 
memory. This conclusion defies logic. A 
capacity for declarative memory is an 
essential part of the concept of intelli¬ 
gence. Failure to evaluate it adequately 
on the WAIS is a deficiency of the test. 
You cannot say that a man who has to be 
provided custodial care due to a cognitive 
defect still has normal intelligence. 

* Milner, 2005; Milner, Corkin, & Teuber, 1968. 


lobe moves along a pathway through the 
temporal lobes that is responsible for iden¬ 
tifying visual objects, and along a pathway 
through the parietal lobes that determines 
object location and movement. 

Two subcortical structures are important 
in cognition (Figure 7.2]: 

The limbic system: The limbic system con¬ 
sists of the hippocampus, the amygdala, and 
the fornix. The limbic system, and espe¬ 
cially the amygdala, is heavily involved in 
emotional arousal, including fear. This is 
important for cognition, because affective 
reactions are important in capturing atten¬ 
tion and in response selection. One of the 
important functions of the frontal and pre¬ 
frontal regions is the inhibition of reactions 
based on affect, thus making cooler-headed 
reasoning possible. 


The hippocampus is central to the con¬ 
struction of declarative memory , memories 
that can be recalled explicitly. People with 
hippocampal injuries may exhibit a pro¬ 
found anterograde amnesia, in which they 
are literally unable to form memories of 
experiences following their injury. Panel 7.3 
presents the case of H. M., which may 
be the most-cited single case study in all 
of psychology. The hippocampus is also 
heavily involved in the learning of spatial 
layouts. 

Cingulate gyrus: This large structure lies 
between the limbic system and the upper 
parts of the cerebral cortex (Figure 7.2). The 
cingulate cortex and the frontal system form 
a circuit that is important for the control of 
attention, in the sense of attending to cer¬ 
tain aspects of the current problem and not 
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others. This is sometimes referred to as 
"executive control.” 5 

This description of the functions of dif¬ 
ferent parts of the brain has not dealt with 
individual differences. We now turn to a dis¬ 
cussion of how variations in brain structures 
and processes are associated with individ¬ 
ual differences in cognition, that is, with 
intelligence. 

7.4. The Brain and General 
Intelligence: g and Its Correlates 

In a classic children’s story Winnie-the-Pooh 
explains his mishaps by saying he is a "Bear 
of very little brain.” 6 Is it true that the big¬ 
ger our brains, the smarter we are? Panel 7.4 
presents two cases in which men with large 
brains displayed considerable intelligence. 
But wait! The panel also contains a discus¬ 
sion of the problems of inferring scientific 
laws from single case studies. The answer 
to the question about brain size and intelli¬ 
gence depends on your perspective. Does a 
large brain imply greater intelligence? From 
an evolutionary point of view the answer is 
"Definitely!” Within our species, the answer 
is "Somewhat.” It also depends on what type 
of thinking we are talking about and where 
in the brain we look. 

7.4.1. Evolutionary and Cross-species 
Evidence 

Cross-species and evolutionary records indi¬ 
cate a strong relationship between brain 
size and cognitive power. Compared to the 
brains of other mammals, the human brain 
is relatively large, and very large for our body 
size. Adult men have brains that are slightly 
above 1400 cc 5 in volume and 1.30 kilograms 
in mass. Women have brains around eight 
to ten percent smaller. Brains also vary with 
age; they are obviously smaller in children, 
and shrink in advanced age. Brain size is 
positively related to body size, both within 
and across species. A comparison to another 

5 Posner et al., 2006. 

6 Milne, 1926, Chapter 4. 


large animal, the elephant, shows the human 
advantage in brain size after body size is 
considered. 

The average weight of the elephant brain 
(which varies depending on whether the 
animal is male or female, and African or 
Asian] is 4.70 kg, more than three times the 
weight of the human brain. The elephant 
weighs from 7,500 kg (African male) to 3,700 
kg (Asian female). Modern humans weigh 
around 75 kg (male) to 62 kg (female). Ele¬ 
phants are sixty to one hundred times heav¬ 
ier than humans, which makes the three-to- 
one ratio of brain weights look a little less 
important. 

This example generalizes nicely. Across 
species there is a remarkably accurate rela¬ 
tionship between brain size and body size, 

E(B i ) = A g S i r *, 

where £(B[) is the predicted brain mass, in 
grams, of the ith species, and S* is body mass 
in grams. This is called an allometric equa¬ 
tion . The Ag and r g terms are constants that 
depend upon the particular group of ani¬ 
mals, g, being compared. If the exponen¬ 
tial term, r gf is less than one, brain size is a 
negatively accelerated function of body size, 
which means that the increase in brain size 
per unit weight of body size decreases as 
the animal gets larger. The human:elephant 
contrast illustrates this. The r g term is about 
.75 across mammals, and .56 across birds. 

Within a group of related species some 
animals will have larger brains than oth¬ 
ers, after accounting for their body sizes. 
The amount of variation that a particular 
species displays, compared to others in the 
same group, is indexed by the encephaliza- 
tion ratio, Q, which is defined as the ratio of 
the observed brain size to the size that would 
be predicted on the basis of body size, 



If Q is greater than one, the species has a 
larger-than-expected brain size, calculated 
by considering brain size:body size ratios in 
comparable animals. 
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Panel 7.4. Two Case Studies of Big 
Brains, and a Comment on Case 
Studies in General 

Albert Einstein 

The theoretical physicist Albert Einstein 
(1879-1955) is often held up as the proto¬ 
typical example of high intelligence. Prior 
to his death Einstein gave permission for 
the scientific study of his brain. Einstein's 
brain was not remarkable, for a man of his 
age at time of death (seventy-six), except 
in one way. His parietal cortex was about 
15% larger than the brains of other men 
who had died at about the same age.* 
Both the scientists who conducted the 
study and the popular press made a good 
deal of this, observing that the parietal 
cortex is involved in visual-spatial reason¬ 
ing and mathematical reasoning^ and that 
Einstein had said that his thoughts tended 
not to be verbal statements. 

The First King of Scotland 

Robert Bruce (Robert I of Scotland, 1279- 
1324) had a spectacular career. In 1309 he 
was supported by the Scottish church 
as King of Scotland, even though he 
had been excommunicated for murder¬ 
ing a political rival. In 1314 he defeated a 
much larger English force at the Battle of 
Bannockburn, establishing Scotland as an 
independent kingdom. 

A modern Scot, Ian Deary, used mag¬ 
netic resonance imaging (MRI) of a cast of 
Bruce’s skull to estimate Bruce’s cranial 
capacity.* He then determined the rela¬ 
tion between cranial capacity estimates 
and intelligence, using a modern survey of 
forty-eight adults, for whom he had both 
an intelligence estimate (a reading com¬ 
prehension test) and measures of skull 
size. Bruce had an estimated intracranial 
volume of 1661.67 cc 5 ; the mean of the 
modern sample was 1492.50. Using the 


capacity:brain size relation in the modern 
sample as a guide, Deary and his col¬ 
leagues estimated Bruce's IQ to be 128, 
approximately two standard deviations 
above average. Deary concluded that 
his estimate was consistent with Bruce’s 
demonstrated skills as a political and mil¬ 
itary leader. 

A Comment on Case Studies 

Case studies make great reading. They 
are often useful in suggesting hypotheses 
about brain action, because they tell us 
what variables are worth examining fur¬ 
ther. What case studies do not provide 
is the sort of careful examination that 
rules out alternative hypotheses. Con¬ 
sider what the cases of Phineas Gage 
(panel 7.2), H. M. (panel 7.3), and Ein¬ 
stein and Robert Bruce contribute to sci¬ 
entific knowledge. 

The case study of H. M. made sub¬ 
stantial contributions to our understand¬ 
ing of the mechanisms of memory. But 
this case study was an unusual one. The 
initial report of H. M.’s cognitive deficits 
was followed by years of careful, well- 
controlled studies of his cognitive capac¬ 
ities. Similar cases were sought out and 
observed. The study of H. M. was valu¬ 
able because extensive observations were 
possible, conceptual replications could be 
made, and various alternative hypotheses 
could be explored. 

There was no systematic study of 
Phineas Gage. It has been possible to 
determine, to some extent, what his 
injuries were, but we have only anecdo¬ 
tal evidence about his behavior. The evi¬ 
dence is moderately strong that he got 
into more trouble after the accident than 
before, but Gage’s behavior was never 
recorded in anything like the detailed 
records we have for H. M. However, 
we do have well-documented records 
of people whose injuries were similar 
to those of Gage. Gage's story makes 
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a nice illustration of points about brain 
injury, but it is not scientific data. 

The analysis of Einstein’s brain is inter¬ 
esting, but hardly conclusive. Many mea¬ 
surements were made comparing Ein¬ 
stein’s brain to the brains of lesser-known 
people. If many measurements are made, 
some of them are likely to meet con¬ 
ventional standards for statistical signif¬ 
icance by chance alone. The fact that 
the difference was in the parietal lobe 
is suggestive, for neuroscientific studies 
have established that the parietal lobe 
is involved in visual imagery and arith¬ 
metic. Einstein did claim that his internal 
thought processes were visual rather than 
verbal. However, self-reports of imagery 
are not highly correlated with objective 
indicators of imagery.§ 

Deary and colleagues’ estimate of 
King Robert’s IQ is fun. Their conclu¬ 
sion is consistent with modern stud¬ 
ies of brain sizeiintelligence relations, 
although Deary and colleagues’ estimate 


of the IQ:cranial capacity size relation¬ 
ship was somewhat higher than is nor¬ 
mally found. While their estimate of 
Bruce’s IQ was 128, the margin of error 
was broad, with confidence intervals 
from 106 to somewhere above 130. Most 
national leaders probably have IQs in this 
range. IQ estimates for the first forty- 
two US presidents run from a high of 
145-160 (Jefferson) to a low of 108-140 
(Harding).** 

And finally, whenever you deal with 
case studies, you have to be concerned 
about case studies that illustrate the 
opposite point, but that you are not told 
about. There are cases of eminent men 
who had rather small skulls. ' But that 
does not make good press. 

* Witelson, Kigar, & Harvey, 1999. 

' Ganis et al., 2004; Piazza & Dehaene, 2004. 

* Deary et al., 2007. 

5 Poltrock & Brown, 1984. 

** Simonton, 2006. 

r See Gould, 1981, pp. 92 ff. for examples. 


But what are “comparable animals”? The 
A and r g parameters differ across classes of 
mammals. Therefore, it is more useful to 
compare the relative encephalization indices 
between two species, within a reference 
class. First the encephalization ratios are 
computed for all species within the class 
of interest. Then one species is arbitrarily 
assigned an encephalization value of one. 
Other species in the class are compared to 
that species, according to the equation 


where j is the species of interest and i is the 
index species for the reference group. 

Within the nonhuman primates, reports 
of innovation, tool use, and social reason¬ 
ing increase with relative brain volume. The 
correlations are in the .5 range and above. 7 

7 Reader & Laland, 2002. 


This suggests taking a closer look at our own 
reference group, the great apes and, more 
specifically, the hominids. 

Using modern humans (.Homo sapiens ) 
as the index species [Rhuman = 0 f° r the 
great apes, the chimpanzee has a relative 
encephalization index of .3. Australopithicus, 
the genus that preceded the genus Homo, 
and that last lived about 2.3 million years 
ago, had a relative encephalization index in 
the .4 to .45 range. Our immediate evolu¬ 
tionary ancestors, Homo erectus, first appear¬ 
ing about 1.8 million years ago, had an 
encephalization ratio of slightly less than .8. 
The fossil record indicates rapid increases 
in encephalization beginning about 500,000 
years ago, with the first appearance of rep¬ 
resentatives of Homo sapiens. Our sometime 
contemporary, but now extinct “cousins” 
(i.e., descendants of a common ancestor, 
but not ancestral to us), the Neanderthals, 
had an encephalization ratio close to 1. (The 
exact value is hard to determine because 
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we have only a few adequate fossils. Nean¬ 
derthal brains were somewhat larger than 
modern human brains, but their bodies were 
also larger.) Encephalization appears to have 
peaked within Homo sapiens about 20,000 
years ago, and may have decreased slightly 
since then. 8 

If you accept the idea that the progression 
from throwing sticks and making stone tools 
to throwing bombs and making plastic tools 
represents an increase in intelligence, then 
there is clear evidence that across species 
increases in brain size parallel an increase in 
intelligence. But does this relationship apply 
to individuals within the same species, us? 

7.4.2. Intelligence and Variations in Brain 
Size in Humans 

In the early nineteenth century there was 
widespread interest in phrenology, a pseudo¬ 
science whose followers claimed that they 
could diagnose intelligence and personal¬ 
ity from the size and shape of the skull. 
They had little evidence to back up their 
assertions. The first objective evidence for 
a within-species brain size:intelligence rela¬ 
tionship was obtained when Galton, and 
then his colleague Karl Pearson, found a cor¬ 
relation of .11 between Cambridge students' 
skull sizes and their grades. 9 

Galton faced a handicap. His study par¬ 
ticipants were alive. I am not being entirely 
facetious. In Galton's time the only way to 
measure cranial capacity, and by inference 
brain size, in a living person was to make a 
measure on the exterior of the skull. This 
method of measurement confounds thick¬ 
ness of the skull and cranial capacity. 

To overcome this handicap several 
researchers, both in Galton's time and since, 
have estimated brain size by direct measure¬ 
ment of cranial capacity post mortem. This 
line of research was so popular in the late 
nineteenth and early twentieth centuries 
that it led to a lively - and, to the modern 
eye, quaint - technical dispute over the rela¬ 
tive merits of measuring brain size by filling 

8 Geary, 2005, pp. 50-54. 

9 Citation by Rushton, 1992. 


the cranial cavity with bird seed, shotgun 
pellets, or sand. An obvious shortcoming 
of such studies is that the investigators sel¬ 
dom had measurements of how intelligent 
the people whose crania they were exam¬ 
ining had been when those people were 
alive. This did not stop the early researchers. 
They simply assumed that certain groups, 
usually Europeans as compared to non- 
Europeans, but sometimes men as compared 
to women, were smarter than others. The 
investigators then pointed to intergroup dif¬ 
ferences in cranial capacity as proof of their 
assumptions. 

In 1981 the Harvard paleontologist 
Stephen J. Gould wrote an elegant, amusing, 
and scathing review of this line of research, 
as part of a more general, and equally 
scathing, analysis of virtually all aspects of 
research on intelligence. 10 He concluded 
that there was no reliable evidence relating 
brain size to intelligence. He also claimed 
that attempts to show group differences in 
brain size were motivated by racial preju¬ 
dice. Gould had previously achieved con¬ 
siderable public credibility as a commen¬ 
tator on science, so his views were widely 
accepted in spite of negative reviews of his 
work in the technical literature. 11 

It is impossible to know whether Gould's 
imputations about investigators’ motives 
were correct or not. Indeed, if the early 
investigators' facts were right it does not 
matter to science what their motives were. 
There have been several careful studies 
about the brain size-intelligence relation¬ 
ship. Gould had his facts wrong. 

In 1992 J. Phillipe Rushton, a professor at 
the University of Western Ontario, analyzed 
records of the cranial capacity in American 
service men and women, taken as part of 
the physical examinations for military ser¬ 
vice. He found that the mean intracranial 
volume in officers was greater than that in 
enlisted men, even after considering body 
size. It is tempting to say that what one 
makes of this depends upon whether one 

10 Gould, 1981, 1996. 

11 For an exceptionally comprehensive review, see 

Carroll, 1995. 
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served as an officer or as an enlisted per¬ 
son. More seriously, though, there is little 
doubt that the average IQ test score for offi¬ 
cers is higher than the average for enlisted 
service men and women. In 1996 Rushton 
and his colleague C. D. Ankney published a 
review of similar studies that had been done 
to that date. They concluded that there is a 
correlation of .15 between estimates of cra¬ 
nial capacity based on skull measurements 
and measured intelligence. 12 Rushton also 
found differences between men and women, 
and between racial groups. These are dis¬ 
cussed in Chapter 11. 

In 1991 Lee Willerman, of the Univer¬ 
sity of Texas (Austin], reported a reliable, 
positive correlation between intelligence 
and brain size measured using imaging 
techniques. 13 Willerman’s study was fol¬ 
lowed by others. In 2004 a meta-analytic 
review concluded that the brain size-IQ 
correlation is about .33. 14 As the correla¬ 
tion between external estimates of the size 
of the crania and imaging estimates of the 
volume of the brain is about .5, we would 
expect the cranial estimate-IQ correlation 
to be approximately .16, in agreement with 
Rushton and Ankney’s finding. More recent 
studies have suggested that the relationship 
is primarily driven by the density of gray 
matter, indicating the importance of local 
connections within each region of the brain. 
However, there does appear to be a smaller, 
but reliable, correlation with the density of 
white matter. 15 

As far as I have been able to determine, 
there is only one study in which investigators 
were able to correlate post mortem mea¬ 
sures of brains with levels of intelligence. 16 
Such a study is important, because direct 
measures of parts of the brain are more accu¬ 
rate than the estimates obtained through 
imaging. The results were confirming in 
some ways and confusing in others. There 
was a .6 correlation between overall brain 
volume and measures of verbal intelligence, 

12 Rushton and Ankney, 1996. 

13 Willerman et al., 1991. 

14 McDaniel, 2005. 

15 Luders et al., 2009. 

16 Wittelson, Beresh, & Kigar, 2006. 


in right-handers. This is on the high side of 
the measures reported in the various meta¬ 
analyses. There was a much smaller relation 
between spatial-visual reasoning and brain 
volume in women, and essentially none in 
men. These findings indicate that in addi¬ 
tion to size differences there may be quali¬ 
tative differences in the way that the brain 
is organized in men and in women, and in 
right-handers as compared to left-handers. 
We explore this issue in the next section. 

7.4.3. Structural Differences in Various 
Regions of the Brain and Their Relation 
to General Intelligence 

While it would be naive to think that intel¬ 
ligence is located in any one place in the 
brain, it is reasonable to believe that differ¬ 
ent parts of the brain participate to differ¬ 
ent degrees in producing intelligence. Brain 
density can be measured, on a region-by¬ 
region basis, using modern imaging tech¬ 
niques. When this is done there is a shift 
from measuring volume to measuring den¬ 
sity of either gray or white matter, as dis¬ 
cussed earlier. Definitive studies are diffi¬ 
cult to accomplish, because the expense of 
imaging makes it difficult to obtain a large 
sample. This reduces statistical power, lead¬ 
ing to a concern that small to moderate 
correlations may not be discovered. Never¬ 
theless, some good studies have been done, 
and some things have been learned. Unfor¬ 
tunately, the results do not present a clear 
picture. 

A wide-ranging review concluded that 
there were substantial correlations between 
measures of cortical density and intelligence, 
in a large number of cortical areas. 1 " 

Three studies have been conducted on 
large samples, chosen to be representative 
of the populations of the US (two stud¬ 
ies] and Spain (one study). 18 These studies 
involved correlations using a general mea¬ 
sure of intelligence, either g or a compos¬ 
ite score, such as the full-scale intelligence 

17 Luders et al., 2009. 

18 Colom et al., 2009; Haier et al., 2009; Karama et al., 

2009. 
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quotient (FSIQ) of the WAIS. Whether 
or not the results were consistent depends 
on how closely you look. All the stud¬ 
ies found that the density: IQ correlations 
were highest for areas within the frontal 
and parietal regions, and in the cingulate 
gyrus and the limbic system. However, they 
disagreed about precisely where the activ¬ 
ity was within these broad brain areas. To 
illustrate, in one of the studies there were 
reliable correlations between density of gray 
matter and a g score in over forty different 
local regions of the brain. In another, using a 
different test battery that had a similar factor 
structure, there were reliable correlations in 
fifteen different areas. Reliability hardly cap¬ 
tures the result; the probability levels for the 
findings are in the p < .005 region. How¬ 
ever, there were only six areas of overlap, 
in which the same region was implicated in 
both studies. 19 

This is consistent with an earlier review, 
in which Richard Haier, at the University 
of California, Irvine, and his colleague Rex 
Jung, at the University of Arizona, deter¬ 
mined the percentage of studies in which 
reliable relations had been found between 
the volume or density of a brain region and 
a measure of intelligence. 20 There was only 
one area of the brain, in the left inferior pari¬ 
etal lobe, 21 where 50% of the studies of brain 
imaging had shown a relation between size 
and intelligence. The densities of areas in 
the anterior frontal cortex, other parts of the 
parietal cortex, and regions in the cingulate 
cortex were correlated with general intelli¬ 
gence scores in 30% of the studies. Most of 
the areas showing effects were in the left 
hemisphere, reinforcing the idea that the 
sorts of information processing required for 
conventional intelligence tests and reason¬ 
ing problems are carried out primarily by 
the left side of the brain. 

Why would such inconsistent results be 
obtained? There are three possible explana¬ 
tions. The least interesting explanation, but 
one that cannot be ruled out, is a statistical 

19 Haier et al., 2009, Table 4 

20 Jung & Haier, 2007. See especially their Figure 2. 

21 Broadman area 40. 


one. In this research correlations are calcu¬ 
lated between many densities of many dif¬ 
ferent brain areas and some form of IQ score. 
In order to avoid reporting results that are 
high “by chance/’ investigators adopt strin¬ 
gent statistical criteria. This increases the 
chance that low to moderate correlations 
will be overlooked. The second possibility 
is that different people’s brains are actu¬ 
ally organized in somewhat different ways. 
The differences would not have to be major 
ones. There are changes in brain organiza¬ 
tion during childhood, and there is differ¬ 
ential organization across men and women. 
There could be other small, but systematic 
changes due to other demographic variables. 
There is also a psychological possibility. 

If a g measure is obtained using a battery- 
type test, such as the WAIS, an individual’s 
score is determined by a weighted compos¬ 
ite of scores on the subtest batteries. There¬ 
fore, two individuals can obtain the same 
g or IQ score in two different ways - for 
example, one person could have a high IQ 
score on the WAIS by obtaining a high ver¬ 
bal IQ and moderate performance IQ, while 
another person could obtain the same score 
with a moderate verbal score and a high 
performance IQ. More generally, when we 
ask someone to demonstrate “intelligence” 
they do not do so, because intelligence is 
an abstraction. They execute a medley of 
information-processing acts, using different 
parts of the brain for the different acts. If a 
g measure is obtained from a single marker, 
such as a progressive matrix test, the ques¬ 
tions on the test will be fairly complex, and 
will yield to different strategies. The same 
principle applies as in the case of the battery- 
type test; two people may obtain the same 
score using different elementary processing 
actions. When this is the case they will not 
display the same pattern of brain activa¬ 
tion, even though they obtain equivalent 
test scores. 

7.4.4. Efficiency Counts 

We have been discussing measures of struc¬ 
ture. An alternative way to look at the rela¬ 
tion between intelligence and the brain is 
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to examine measures of overall neural effi¬ 
ciency, thus exploring the idea that in addi¬ 
tion to having bigger brains, intelligent peo¬ 
ple may have brains made of a better grade 
of neurons. 

The rudiments of this idea can be traced 
back to Galton’s studies of reaction time. It 
is also captured in modern studies of rela¬ 
tions between intelligence and the speed of 
information processing during very simple 
tasks, as discussed in Chapter 6. However, 
the evidence connecting overt response 
speed to neural measures is at best indi¬ 
rect. Studies that have used the new imaging 
technologies to examine brain action dur¬ 
ing thinking are more informative, especially 
since it was not clear in advance what they 
should have found. 

Suppose two people attack the same 
problem. One of the problem solvers has 
markedly higher intelligence test scores than 
the other one. Would you expect the person 
with the higher scores to show more or less 
metabolic and neural activity than the per¬ 
son with the lower scores? 

A case can be made for either answer. 
A high metabolic rate might be a sign that 
highly intelligent people have more mental 
energy in a literal sense, for their metabolic 
systems might provide more energy to their 
brains. A low metabolic rate might be a sign 
that highly intelligent people have more effi¬ 
cient neural systems, and therefore need less 
energy per mental computation than less 
intelligent people do. 

In 1988 Richard Haier's University of 
California, Irvine, group showed that the 
“more efficient’' hypothesis is correct. They 
obtained PET scans from eight univer¬ 
sity students as the students solved Raven 
Advanced Progressive Matrices (RAPM) 
problems. The correlations between meta¬ 
bolic rates and RAPM scores were in the 
— .7 range, varying somewhat across areas 
of the brain. 22 As a result of the finding by 
Haier’s group, the weight of the evidence 
shifted toward the hypothesis that intelli¬ 
gence is associated with brain efficiency, not 
energy generation. 

22 Haier et al., 1988, Table 2. 


Haier and colleagues' findings have been 
replicated, using fMRI and other techniques. 
A particularly interesting line of research has 
been followed by Austrian researchers using 
the very different EEG/ERP technology. 23 
They found that the amplitudes of neural 
responses to verbal or spatial problems were 
negatively correlated with measures of ver¬ 
bal or visual-spatial reasoning, respectively. 
There was also an interesting male-female 
difference. The relations between EEG 
amplitude and test scores were strongest for 
males attacking spatial-visual problems and 
females attacking verbal problems, mirror¬ 
ing the differences between men and women 
in verbal and visual intelligence. 

A study from Carnegie-Mellon Univer¬ 
sity also bears on the issue of efficiency. 
University students were given the reading 
span test described in Chapter 6. On the 
basis of their scores they were divided into 
groups having high or low verbal working 
memory. All groups then attempted sen¬ 
tence comprehension tasks, with sentences 
that varied widely in their linguistic com¬ 
plexity (e.g., sentences with high or low fre¬ 
quency words, simple or complex syntax]. 
Their brains were imaged using fMRI during 
the sentence comprehension task. Activa¬ 
tion in areas associated with linguistic anal¬ 
ysis decreased with working memory span, 
and increased with linguistic complexity. 24 

Two further studies are worth noting, 
because they illustrate the flexibility of the 
brain. In addition to finding that people with 
high verbal working memory spans showed 
less overall activation, the Carnegie-Mellon 
group also found that the high-span individ¬ 
uals showed a greater response to increas¬ 
ing linguistic complexity than did low-span 
individuals. There was evidence of greater 
coordination in activation of different areas 
of the brain in the high-span than in the 
low-span group. 

The second study, by a different group, 
utilized the “three back” task. In this task 
a series of verbal or figural stimuli are 

23 Neubauer, Fink, & Schrausser, 2001; Neubauer et al., 

2005. 

24 Prat, Keller, & Just, 2007. 
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presented. The participant is supposed to 
respond if the current figure is identical to 
one presented three items back in the series. 
As an example, using letters, suppose that 
the series was 

ABCARQKKQ. 

A “three back” response should be given 
to the second A and the second Q. These 
are called targets. The second K should not 
be responded to, because it is “one back” 
rather than “three back.” Such items are 
called lures. Lures are frequently misiden- 
tified as targets. It evidently requires an 
effort to suppress responding, for £MRI scan¬ 
ning showed that when lures are presented 
there is an increase in metabolism in the 
frontal and parietal areas. This was inter¬ 
preted as an indication that an inappropri¬ 
ate response was being suppressed. Scores 
on the Raven’s Matrix test were positively 
correlated both with accuracy and with the 
size of the metabolic response. 25 

These results are typical of other find¬ 
ings. Several research groups have reported 
that when they contrast metabolic or elec¬ 
trical activity from people who, by other 
means (intelligence tests or, often, measures 
of working memory] have been shown to 
have different levels of cognitive skills, the 
more skilled tend to activate fewer brain 
sites. 26 

What happens when the brain is 
“on idle?” Haier’s UCI group asked this 
question. 27 They measured brain activa¬ 
tion while people were watching videos, 
but not otherwise engaged. In this admit¬ 
tedly exploratory study, high Raven Matrix 
scores were associated with greater inte¬ 
gration between the object-recognition and 
linguistic-processing areas of the brain. 
There is an interesting resemblance between 
this result and Ackerman’s contention (dis¬ 
cussed in Chapter 5) that intelligent people 

25 Gray, Chabris, & Braver, 2003. 

26 Bomkessel, Fiebach, & Friederici, 2004; Caplan, 

Waters, & Alpert, 2003; Neubauer et al., 2005; Lar¬ 
son et al., 1995. 

27 Haier, White, & Alkire, 2003. 


show an above-average intellectual engage¬ 
ment with tasks that are not directly related 
to their work or study. Even when watch¬ 
ing videos from commercial television, the 
intelligent brain does not completely disen¬ 
gage. 

Both the structural and process technolo¬ 
gies present the same basic message. Intel¬ 
ligence is associated with larger brains, and 
with more efficient brains. However, there 
is no single hot spot in the brain, associ¬ 
ated with all aspects of cognition. The brain 
provides a tool kit for intelligent action. 
An intelligent person has somewhat higher 
quality tools, and organizes them more effi¬ 
ciently, than an unintelligent person. 

7.5. The Brain and Specific Cognitive 
Functions 

Accepting the idea that the brain is a tool 
kit, we now look at some of the special¬ 
ized tools. We first look at the relation 
between working memory and brain func¬ 
tions, on the grounds that working memory 
is tightly enmeshed with general reason¬ 
ing, and hence our most important sin¬ 
gle general processing capacity. We then 
look at the brain structures underlying other 
information-processing capacities that have 
been associated with various aspects of 
intelligence. 

7.5.1. The Brain and Working Memory: 
Evidence and the P-FTT Model 

Figure 7.3 summarizes the findings of 
over two hundred studies in which brain 
metabolism was measured as people did var¬ 
ious activities related to attention and rela¬ 
tively short-term memory, including oper¬ 
ations on information while it is being held 
in memory. 28 The reviewers summarized the 
study as showing that working memory tasks 
showed heavy involvement of the frontal 
and parietal cortices, and the anterior cingu¬ 
late gyrus. Lateralization depends upon the 

28 Cabeza & Nyberg, 2004. See also Smith & Jonides, 

1999 


INTELLIGENCE AND THE BRAIN 


i 9 i 



Figure 7.3. A sketch of areas of the brain that show activity during a variety of tasks involving working 
memory. The upper figures show a lateral view of the cortex; the lower figures show a medial view. 
From Cabeza & Nyberg, 2000, Figure 3. Reprinted with the permission of Massachusetts Institute of 
Technology Press Journals. 


nature of the stimuli. In general, problems 
involving spatial and figural stimuli produce 
greater activation in the right than in the 
left hemisphere, while the reverse pattern is 
seen for problems involving language. 

Figure 7.3 shows what parts of the brain 
are involved in working memory tasks in 
general. But to what extent are these areas 
related to individual differences in intelli¬ 
gence, and especially to either (depending 
on your theoretical predilections) g or Gf? 
One approach to this question is to take 
tasks that are known to have high g load¬ 
ings in the intelligence literature, such as 
progressive matrix tests, and to determine 
what areas of the brain are active when 
people do these tasks. Another strategy is 
to determine what sort of brain injuries 
result in selective loss of the ability to deal 
with Gf-type problems, as opposed to Gc- 
type problems, where the solution depends 
largely on retrieving previously acquired 
information. 

Both approaches lead to essentially 
the same answer. 29 Jung and Haier have 

29 Many studies could be cited. Some representative 

ones are Duncan, 1996, and Duncan, Burgess, & 

Emlie, 1995, for brain injury; Duncan et al., 2000, 


produced the Parieto-Frontal Integration The¬ 
ory (P-FIT) theoretical model that does 
a good job of encapsulating our present 
knowledge. 50 They propose that our abil¬ 
ity to do the sort of thinking captured by 
measures of Gf and working memory is sup¬ 
ported by a system of brain regions involv¬ 
ing the dorso-lateral frontal cortex, the 
parietal lobe, and the anterior cingulate 
gyrus. Each of these regions performs some 
of the functions needed for abstract reason¬ 
ing and problem solving; no one of them is 
sufficient alone. 

Jung and Haiers idea of the role of 
the frontal cortex is consistent with other 
observations, both of imaging results and of 
the sort of scattered thinking displayed by 
Phineas Gage, and by many other patients 
with frontal lobe damage. In general, the 
frontal lobes seem to be necessary to keep 
a person on-taskT 1 In one study that nicely 

and Haier, White, & Alkire, 2003, for differential 
activation of brain areas during imaging; and Colom, 
Jung, & Haier, 2006, for a discussion of the relation 
between neural density and g loadings. 

30 Jung & Haier, 2007. 

31 See Duncan et al., 1996, for a good discussion of 
this point. Shallice (2004), provides an independent 
review of frontal lobe functioning that is generally 
consistent with Jung and Haier s view. 
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illustrates the effect, it was shown that chil¬ 
dren who did not have a history of attention 
disorder prior to injury to the frontal and 
especially prefrontal-cortex displayed symp¬ 
toms of attention deficit disorder after the 
injury. 32 One of the intriguing findings on 
this topic is that one area of the frontal lobe 
appears to be responsible for orchestrating 
thinking about things and abstract ideas, 
while another region orchestrates thinking 
about socially relevant topics. 33 

Jung and Haier argue that the parietal 
cortex is responsible for integrating infor¬ 
mation from various sensory modalities. 
This would be consistent with the parietal 
cortex’s established role in controlling the 
deployment of attention externally, to par¬ 
ticular regions of and objects in the sensory 
fields. 34 The role of the parietal cortex in 
providing temporary storage areas for infor¬ 
mation also appears to be well established. 
This seems to be an area where the lateral¬ 
ization is especially well marked; linguistic 
information, the phonetic loop in Baddeley's 
model, resides in the left (in most of us), 
while spatial and object information is held 
on the right. 33 

Jung and Haier propose that the ante¬ 
rior cingulate gyrus acts as a response selec¬ 
tion device. It is responsible for directing 
decisions, albeit with substantial regulatory 
input from the frontal lobes. The ante¬ 
rior cingulate gyrus also seems to weigh 
the likely consequences of taking an action. 
Clancy Blair, a developmental psycholo¬ 
gist at Pennsylvania State University, has 
pointed out that recent research, which 
it would take us too far afield to exam¬ 
ine, has shown that emotional evaluation 
of outcomes plays an important part in 
response selection, even in situations where 
one would expect rational decision making 
to be the norm. 36 Combining Blair and Jung 
and Haier’s views, an important part of the 
frontal cortex-cingulate gyrus interaction 

32 Max et al., 2005. 

33 Beer, Shimamura, & Knight, 2004. 

34 Posner et al., 2006. 

35 Cabeza & Nyberg, 2000. For a good illustrative 

study, see Smith & Jonides, 1999. 

36 Blair, 2006. 


may be modulation of the emotional and 
nonemotional aspects of decision making. 

In summary, there is clear evidence that 
the working memory system, which we 
know is central to reasoning and general 
intelligence, is supported by a brain sys¬ 
tem involving regions of the frontal lobe, 
the parietal lobe, and the anterior cingulate 
cortex. I do not want to give the impression 
that these are the only areas involved, or that 
all the details of the involvement have been 
worked out. They have not, but the outline 
is clear. 

Having disposed of g, we now look 
at brain correlates of the three secondary 
dimensions of the g-VPR model: verbal intel¬ 
ligence, perception, and spatial rotation. 
The discussions of perceptual and spatial- 
rotation dimensions will be collapsed into 
one, with an emphasis on the R dimension. 
This is because the imagination of move¬ 
ment in space is central to spatial reasoning, 
which seems to me more intellectual than 
detecting small differences in visual stimuli, 
and because the ability to visualize move¬ 
ment appears to be related to mathematical 
and numerical reasoning. 

7.5.2. Structures Associated 
with Verbal Intelligence 

Where is language in the brain? The answer 
to this question can be brief, not because 
so little is known but because so much is 
known that reference can be made to general 
textbooks and reviews. There is also what is 
known as the “issue of granularity.” When 
we talk about a person having high verbal 
intelligence, we mean that the individual 
displays a high level of general competency 
with language. This includes a large vocabu¬ 
lary, and the ability to follow complex argu¬ 
ments. We do make a distinction between 
comprehension and expression, acknowl¬ 
edging that some people who compre¬ 
hend well do not always express themselves 
well. 

At this level of detail the neuroscientific 
basis of normal language comprehension has 
been known since late in the nineteenth cen¬ 
tury. There are major centers for language 
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Figure 7.4. Broca’s and Wernicke’s regions on 
the left side of the brain. The display is for a 
right-handed individual. Sketch by the author. 


comprehension in the left posterior frontal 
lobe (Broca’s area] and in the left poste¬ 
rior temporal lobe (Wernicke's area]. These 
are shown in Figure 7.4. These regions were 
identified on the basis of clinical studies. 
Patients with lesions in Broca’s area under¬ 
stand language but are unable to express 
it. Patients with lesions in Wernicke’s area 
can speak, but their sentences are often 
incoherent. 

Imaging studies have confirmed the 
nineteenth- and twentieth-century neuro¬ 
psychological findings. Figure 7.5 shows a 
summary of regions where imaging studies 
have identified brain activation associated 
with the perception and naming of words. 
While there is some involvement of the right 
hemisphere for listening, without a spoken 
response, the majority of the active centers 
are clearly in the left posterior frontal and 
temporal regions. 

A similar picture is obtained in stud¬ 
ies showing relations between verbal com¬ 
prehension test scores and neural density. 37 
One project (covering two studies, so self¬ 
replication was accomplished] obtained cor¬ 
relations greater than .7 between the density 
of gray matter in the posterior left temporal 
region and participants’ scores on the vocab¬ 
ulary and information subtests of the WAIS, 
the two tests most associated with verbal 
comprehension. 38 Another has reported that 
adolescents with high scores on the vocabu¬ 
lary subtest of the WAIS show higher den¬ 
sities of gray matter in, somewhat surpris¬ 
ingly, the parietal lobe. 39 Two other reports 

37 Colom et al., 2009. Gc tasks tend to stress language 

ability (Carroll, 1993]. 

38 Colom et al., 2006, Tables 2 and 4. 

39 Lee et al., 2007. 


have indicated that reduced neural density 
in the temporal and frontal lobe regions 
associated with language identifies young 
children at risk for developing reading 
difficulties. 40 

If we move to a more complex task, 
sentence comprehension, we find a simi¬ 
lar picture, with one addition. There begins 
to be more activation of the left frontal 
regions, those regions that are also involved 
in working memory tasks involving words or 
sentences. 41 This is hardly surprising. One 
of the more important findings, though, is 
that the more complex the sentence com¬ 
prehension task the greater the activation 
level in all areas associated with language 
comprehension. 42 These findings are now so 
well established that research has moved for¬ 
ward to attempts to locate particular spe¬ 
cialized functions - for example, process¬ 
ing a verb - rather than repeating studies 
looking for the general location of language. 
At this point students of individual differ¬ 
ences are apt to lose interest, because of the 
need to maintain Brunswikian symmetry by 
trying to correlate concepts that are at the 
same level of granularity. Verbal intelligence 
is a broader concept than comprehending 
verbs. 

7.5.3. Structures Associated with 
Perceptual and Rotational Skills 

Verbal and Perceptual and Rotational 
skills (more properly, imagery and spatial- 
reasoning functions] are produced by differ¬ 
ent parts of the brain. To explain this we 
must consider how the brain conducts visual 
analysis. 

Visual stimuli are initially analyzed in the 
optic tract and connected subcortical struc¬ 
tures, prior to arriving in the occipital lobe, 
at the back of the brain. Analyses in the 
occipital lobes construct three-dimensional 
representations of an object from the two- 
dimensional pattern on the retina. The 

40 Deutsch et al., 2005; Hoeft et al., 2007. 

41 Caplan, Waters & Albert, 2003; Colom et al., 2009; 

Cooke et al., 2006. 

42 Prat, Keller, & Just, 2007. 
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Figure 7.5. Areas that have shown activation during selected language comprehension tasks. Note the 
distinction between visual and auditory input, and between reception and reception accompanied by 
a verbal response. The upper figures show a lateral view of the cortex; the lower figures show a 
medial view. From Cabeza & Nyberg, 2000, Figure 6 . Reprinted with the permission of Massachusetts 
Institute of Technology Press Journals. 


(interpreted) visual image is then analyzed 
by two neural pathways. One goes from the 
occipital lobe along a dorsal route through 
the parietal lobe; the other moves ventrally 
into the temporal lobe. (See Figure 7.6). The 
ventral pathway is largely concerned with 
identifying what the stimulus is, including 
identification of its attributes (e.g. ; color). 
The dorsal pathway is concerned with where 
the stimulus is, and whether or not it is mov¬ 
ing. Individual differences in visual-spatial 
reasoning, Gv in the three-stratum model 



Figure 7.6. The dorsal and ventral visual 
streams. The ventral stream, through the 
temporal lobe, is primarily concerned with the 
identification of stimuli and their attributes. The 
dorsal stream carries information about location, 
movement, and configuration. Sketch by the 
author. 


and the PR dimensions in the g-VPR model, 
are related to gray matter cortical density 
along both pathways. 43 

Imagery and visual reasoning utilize brain 
structures that are close to, although not 
exactly identical to, the structures that sup¬ 
port vision. Thus, as a rough approximation, 
we can think of the brain's visual analysis 
system as being driven either by the visual 
sensory system or by some “executive con¬ 
troller” in the brain itself. 44 In terms of the 
tool kit analogy, different parts of the brain 
can be used to attack a task, depending upon 
whether a verbal or a spatial-visual strat¬ 
egy is chosen. Panel 7.5 contains an example 
illustrating this point. 

Male and female brains deal with visual- 
spatial tasks in somewhat different ways. 
This is interesting because there are marked 
male-female differences in visual-spatial rea¬ 
soning, especially if the task involves visu¬ 
alizing motion. There is also a substan¬ 
tial amount of evidence indicating that the 
level of androgens influences visual-spatial 
reasoning. Prenatal influences seem to be 

43 Colom et al., 2009, Figure 4. Note the Gv correlates 

indicated in this figure. 

44 Ganis et al., 2004; Kosslyn, 1994. 
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Panel 7.5. The Sentence Verification 
Paradigm and its Neural Correlates 

The sentence verification paradigm has 
been widely used to study verbal com¬ 
prehension. The participant first reads a 
sentence describing a picture and then 
sees the picture. The task is to deter¬ 
mine whether or not the sentence accu¬ 
rately describes the picture. Quite simple 
pictures are used. A favorite is the “plus 
above star” arrangement, where the pic¬ 
ture is either 


The simplest descriptions of these pic¬ 
tures are “star above plus” and “plus above 
star.” More complicated sentences can be 
used, for example, “plus not below star.” 
In either case the task can be regarded 
as a prototypical linguistic act, where the 
reader must coordinate verbal and per¬ 
ceptual information. 

There are two strategies for sentence 
verification. In the visual strategy the par¬ 
ticipant reads the sentence and creates an 


image of the anticipated picture. When 
the picture is shown the participant 
decides if it matches the image. In the 
verbal strategy the participant first reads 
and memorizes the sentence. When the 
picture is shown the observer describes 
it, covertly, and decides whether or not 
the verbal description has the same mean¬ 
ing as the memorized sentence. College 
students can use either strategy. 5 " Marcel 
Just’s group at Carnegie-Mellon univer¬ 
sity took fMRI images of the same peo¬ 
ple when they were instructed to use 
either the verbal or visual strategies, thus 
relying on either verbal or visual work¬ 
ing memory.* The centers of activation 
were in either the left posterior frontal 
region (verbal strategy) or the right pari¬ 
etal region (visual strategy), depending 
on the strategy used. Figure 7.7 illustrates 
the result. 

This experiment shows that you can¬ 
not say that the brain controls this or that 
piece of cognitive behavior. It depends 
upon how the behavior is achieved. 

* MacLeod, Hunt, & Mathews, 1978; Mathews, 

Hunt, & MacLeod, 1980. 

Reichle, Carpenter, & Just, 2000. 


especially important. To show how general 
this is, the phenomenon can be produced 
in rats and mice. The situation seems to be 
quite complicated, but there is little doubt 
that individual differences in brain function¬ 
ing do affect visual-spatial reasoning. 

Orienting ability , the ability to develop a 
mental map of an area and to locate one¬ 
self in it, is worth special discussion. Ori¬ 
enting ability is certainly part of intelligence 
in the conceptual sense, but it is assessed 
only indirectly by conventional intelligence 
tests, due to the limitations of the conven¬ 
tional “Drop in from the Sky” testing format. 
This is one of the interesting cases where the 
argument that learning must influence the 
brain can be supported by evidence. London 
taxicab drivers show enlarged hippocampi, 


compared to appropriate control subjects. 45 
In a more controlled setting, hippocampal 
volume is positively associated with how 
well people learn to navigate a computer¬ 
generated virtual environment. 46 

7.5.4. Long-Term Memory 

Working memory is the workbench used to 
keep track of things related to the problem 
that is before us. Long-term memory refers 
to the ability to acquire some information, 
put it aside for awhile (which may be any¬ 
where from a few minutes to a year), and 
then recall it when required. What brain 
structures are involved, and how do they 

45 Maguire et al., 2003. 

46 Moffat et al., 2007. 
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relate to individual differences in long-term 
memory? It turns out that the answers to 
these questions depend upon the sort of 
memory we are talking about. We have to 
distinguish between three different types of 
memory, and two different recall mecha¬ 
nisms. 

Episodic memory refers to memory for 
specific events that have occurred in one’s 
life - for instance, what you had for breakfast 
yesterday morning. Semantic memory refers 
to memories of how the world works; know¬ 
ing that "hens lay eggs” is a piece of seman¬ 
tic memory. Procedural memory refers to 
knowledge of how to do things. Riding 
a bicycle is an often-used example. Logi¬ 
cally, one might think that episodic mem¬ 
ory would be central, and that semantic and 
procedural memories would derive from it. 
This is not the case. It is possible to acquire 
semantic and procedural information with¬ 
out storing an episodic record of how that 
information was recalled. The issue is not 
one of forgetting, which is normal. (Do 
you remember when you learned that hens 
lay eggs?) In some circumstances semantic 
and procedural information can be acquired 
even though no episodic record is ever laid 
down. To explain this we have to look at 
recall. 

Recall can be explicit (declarative) or 
implicit. Recall is explicit if you are aware 
of the act of recall. If I ask you what you had 
for breakfast, you will be aware of recalling 
the answer. Whether you recall accurately is 
another matter. 

Implicit recall is shown by a demon¬ 
stration that behavior has been altered by 
an experience, even though that experi¬ 
ence is not available for explicit recall. 
Implicit recall is dramatically illustrated by 
patients who have experienced electrocon¬ 
vulsive shocks to the brain, a form of ther¬ 
apy that actually has a history of benefits 
in some psychoses. The patient will typ¬ 
ically not remember the experience but 
does show signs of nervousness if he or 
she reenters the room where the therapy 
took place. However, implicit recall is not 
solely associated with pathologies. It can be 


demonstrated in an undergraduate psychol¬ 
ogy laboratory, using quite innocuous pro¬ 
cedures. 47 

Studies associating individual differences 
in memory with intelligence generally deal 
with explicit recall of episodic and semantic 
information. Many episodic memory stud¬ 
ies deal with very short time intervals, often 
only a few seconds, as in the digit span test, 
which is part of many intelligence test bat¬ 
teries. Explicit recall of semantic memory 
is also tested, often under the title “test of 
general knowledge” or "vocabulary.” It could 
be argued that semantic memory is a proto¬ 
typical example of a crystallized intelligence 
(Gc) skill. 

We have already seen, in the case of H. M. 
(panel 7.4) that someone who has lost his 
or her hippocampus will literally not recall 
a meeting that took place fifteen minutes 
earlier. H. M. had lost episodic memory. 
Similar dramatic losses of memory have 
been related to damage to the limbic system 
that is associated with alcoholism. 48 In these 
cases the frontal lobe may also show signs 
of damage. Dramatic losses of episodic and 
then semantic memory occur in Alzheimer’s 
disease, a variety of senile dementia that can 
occur as early as fifty, affects 3-4% of peo¬ 
ple between sixty-five and seventy-four, and 
approximately half of all Americans over 
the age of eighty-five. Alzheimer’s disease 
is associated with widespread reduction in 
brain volume, including both the frontal 
lobes and the hippocampal formation. 

Are variations in brain structure also asso¬ 
ciated with individual differences in mem¬ 
ory within the normal range? This is a dif¬ 
ficult question to answer, because we have 
to distinguish among the various types of 
memories and how they are developed. 

47 The technique used is to present people with very 
long lists of words, more than they can recall. Sub¬ 
sequently they are asked to perform some task that 
tests their readiness to use a word on the list, or some 
other word. There will be a tendency to use the 
word on the list, even though people cannot recall 
seeing it on the list. See Jacoby, Toth, & Yonelinas, 
J 993- 

48 Sullivan & Marsh, 2003. 
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Figure 7.7. Regions of the brain activated within a single 
individual when that person was instructed to solve a sentence 
comprehension problem using either a verbal or a visual imagery 
strategy. The task is described in panel 7.5. From Reichle et al., 
2000, Figure 3, with permission from Elsevier. 


Consider how memories come to be. 
First, you have an experience. An internal 
representation of that experience will be 
created. This is a constructed representa¬ 
tion of what is going on, an act of prob¬ 
lem solving that takes place largely in the 
frontal-parietal-anterior cingulate circuit. 


The information in the coded representa¬ 
tion is then transferred to long-term mem¬ 
ory, along with its connections to informa¬ 
tion already in memory. This function is 
carried out by the hippocampus and other 
structures in the medial temporal cortex. 
The hippocampus is essential for the storage 
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process, but, with the exception of maps 
of spaces, storage apparently takes place all 
over the cortex. 49 

Given this two-stage process, it is not sur¬ 
prising to find that individual differences 
in learning and memory have been asso¬ 
ciated with measures of volume in both 
the frontal lobes and the hippocampus and 
in related structures. The studies are small 
(in part because they are expensive], so 
there has been a tendency to focus on 
extreme groups, but the picture is con¬ 
sistent. Both the frontal-parietal-cingulate 
gyrus and the medial temporal cortex sys¬ 
tems are required - one to decide what is 
to be remembered and one to carry out the 
storage process.* 0 As is the case for virtually 
all findings in the rapidly developing field of 
neuroscience, all the details are not filled in. 

One thing is quite clear. Memories do 
not reside in individual neurons; they reside 
in patterns of neurons. Accordingly, it has 
been suggested that intelligence depends 
upon the brain's ability to establish new con¬ 
nections between neurons, as well as any 
structural differences in the regions involved 
in memorization. What is the evidence for 
this? 


7.6. Neural Plasticity 

Human intelligence depends upon the abil¬ 
ity to learn. When we learn something we 
change the brain, both in the sense that new 
memories are stored in the brain and in the 
sense that as we learn to do things (including 
learning how to solve problems], the brain 
is reorganized. The case has been made that 
differences in the ability to reorganize the 
brain in the face of experience, individual 
differences in neural plasticity, may be an 
important aspect of intelligence. Computer 
simulations have shown that things could 
work this way.* 1 But do they? 

49 Eichenbaum, 2004. 

50 For typical work, see Habib, McIntosh, & Tulving, 
2000; Rosen, 2003; Sullivan & Marsh, 2003; Tulving 
et al., 1999. 

51 See Garlick, 2002, for an example of this sort of 
argument. 


The effects of increasing intelligence 
upon brain mechanism are mirrored by the 
effects of increasing practice; the more prac¬ 
tice on a task, the lower the measured 
metabolic rate. In another set of PET stud¬ 
ies by the UC Irvine group, Haier and his 
colleagues had students learn to play the 
computer game TETRIS, a highly demand¬ 
ing visual-spatial task. There was a marked 
decrement in metabolism as the task was 
learned. In addition, the greatest decrements 
were observed in people with the highest 
intelligence test scores. Similar findings have 
been observed with quite different tasks.* 2 
Corroborating evidence comes from the 
study of infant intelligence. There have been 
many attempts to identify characteristics of 
infant behavior that could predict later intel¬ 
ligence. One of the most successful proce¬ 
dures is a technique known as habituation. 
In a habituation study an infant is shown a 
picture, and then shown the now-familiar 
picture along with a novel picture. Habitua¬ 
tion is measured by the extent to which the 
infant looks at the novel picture. The idea 
is that this tests the ability to form a mem¬ 
ory of the familiar picture, that is, to reor¬ 
ganize neural patterns to reflect experience. 
Habituation measures taken in the first year 
of life have a correlation of .36 (corrected 
for unreliability, .53] with IQ scores taken 
at age 21 . 53 

Studies like these suggest that there is a 
relation between intelligence and the abil¬ 
ity to organize neural patterns over a brief 
period of time. However, the neural reor¬ 
ganization is inferred rather than observed. 
An important study using the diffusion ten¬ 
sor imaging technique provided direct evi¬ 
dence, over a much longer period of time. 
During the early years of life the cerebral 
cortex first thickens, then thins. The process 
is not completed until late adolescence. It is 
believed to be associated with selective re¬ 
arrangement of cortical neurons. The trajec¬ 
tory of development is much more sharply 

52 Haier et al., 1992. See also the conceptual replica¬ 
tion, using a verbal memory task, by Habib, McIn¬ 
tosh, & Tulving, 2000. 

53 Fagan, Holland, & Wheeler, 2007. 


INTELLIGENCE AND THE BRAIN 


m 


defined in individuals with high IQ scores 
(>120) than in individuals with lower 
scores. 54 This could be due to individual dif¬ 
ferences in the ability to reorganize the brain 
to incorporate new information, or it could 
be due to programmed (i.e., genetic) differ¬ 
ences in the developmental progression of 
the neural system. 

Many studies have shown that high intel¬ 
ligence test scores are correlated with faster 
learning, in contexts ranging from labora¬ 
tory tasks through academic studies and on 
to workplace apprenticeships. Any learn¬ 
ing has to be associated with some sort of 
reorganization in the brain. Individual dif¬ 
ferences in neural plasticity could produce 
such reorganization, but so could individual 
differences in other brain mechanisms. For 
instance, it could be that superior learning is 
mediated by a superior ability to focus atten¬ 
tion, something that the intelligent clearly 
have. Why intelligence is associated with 
rapid learning remains in question. There 
may not be any one answer. 

7.7. What Do We Learn from Studies 
of the Brain and Intelligence? 

The brief answer to this question is "Lots, 
and there is more to come.” A longer answer 
is more thoughtful, and a bit more reserved. 

Intelligence is associated with multiple 
brain systems. General reasoning is sup¬ 
ported largely by circuits between the dor¬ 
solateral frontal cortex, the parietal cortex, 
and the anterior cingulate cortex; verbal rea¬ 
soning by structures in the frontal and tem¬ 
poral lobes, largely but not exclusively in 
the left hemisphere; spatial-visual reason¬ 
ing (including rotation) by the occipital and 
parietal lobes; orientation by the hippocam¬ 
pus; and episodic memory by the general 
reasoning system, to decide what to remem¬ 
ber, and by the hippocampus and related 
structures that carry out the process of writ¬ 
ing information into storage locations all 
over the cortex. 

54 Shaw et al., 2006. 
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Figure 7.8. The relation between brain systems 
and broad (second-stratum) psychometric 
abilities. The boxes above the dotted line show 
relations between brain systems and the broad 
abilities identified in the VPR model of 
intelligence. The boxes below the dotted line 
show the relations between brain systems and 
cognitive skills that show strong individual 
differences, but that are not typically evaluated 
in psychometric studies due to limitations in the 
conventional testing paradigm. 

Three of these systems map closely 
onto the dimensions identified in Johnson 
and Bouchard’s VPR model of psychome¬ 
tric studies of intelligence. 55 The relation¬ 
ship is shown, in block diagram, in the 
region above the dotted line in Figure 7.8. 
The frontal-parietal-cingulate gyrus sys¬ 
tem supports general reasoning, with some 
hemispheric specialization for the modal¬ 
ity involved, typically verbal versus spatial- 
visual reasoning. The system connecting 
the occipital and parietal cortex supports 
spatial-visual reasoning, while systems in the 
posterior frontal and temporal cortex are 
essential for verbal reasoning. 

The systems approach is consistent with 
several findings showing that the densities of 
both gray matter and white matter correlate 
with intelligence test scores. This includes 

55 Johnson & Bouchard, 2005a. 
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both general scores, such as the Full Scale IQ 
on the WAIS or the g composite on battery- 
type intelligence tests, and scores on special¬ 
ized tests of verbal and nonverbal reasoning. 
The density of gray matter probably reflects 
the computing capacity within various spe¬ 
cialized computation centers in the brain, 
while the white matter reflects the quality 
of the connections between them. 

The part of the diagram below the dot¬ 
ted line in Figure 7.8 shows relationships 
between identifiable brain systems and func¬ 
tions that are clearly cognitive but are out¬ 
side the VPR model, even though there are 
strong individual differences in them. The 
deficit is not unique to the VPR model; it 
is shared by all models of intelligence that 
are restricted to an analysis of psychometric 
data. Learning and orientation are certainly 
part of the conceptual meaning of intelli¬ 
gence. Their evaluation has been excluded 
from conventional testing simply because 
the necessary behavioral evaluations do not 
fit into the testing paradigm. This is a logis¬ 
tical rather than a scientific consideration, 
and ought not constrain our thinking about 
intelligence. 

7.7.1. Intelligence Emerges from the 
Interaction between Brain Systems 

Terms like “general reasoning capacity” and 
“verbal reasoning” refer to broad dimensions 
of intelligence, Carroll’s 56 second-stratum 
abilities. These, by definition, can be applied 
in many situations. 

The broad dimensions can be broken 
down further. This is essentially what stud¬ 
ies of individual differences in information 
processing do. Figure 7.9 illustrates this, by 
showing a possible further breakdown of 
general reasoning ability. This diagram is 
not to be understood as a seriously pro¬ 
posed model, although it has some support. 
What it is intended to show is that con¬ 
cepts such as g and working memory refer to 
properties that emerge from the interaction 
between components that can be defined 

56 Carroll, 1993. 


at the information-processing level, and are 
supported by separate neural mechanisms. 
Similar diagrams could be drawn for ver¬ 
bal reasoning or for spatial-visual reasoning. 
Because broad abilities emerge from the 
interaction between components, attempts 
to connect broad intellectual abilities with 
brain-level and information-processing con¬ 
cepts should also deal with broad concepts, 
such as the neural circuits supporting the 
working memory and attention complex as 
a whole. Attempts to relate a widespread 
trait, such as g, to a component of the brain’s 
functional networks, such as looking for the 
neural locus of intelligence in some small 
part of the frontal cortex, are not likely to 
work out very well. It is probably a good idea 
to remain at the appropriate system level, as 
shown in Figure 7.8, rather than trying to 
break the system into additive components. 

This leaves us with the problem of 
explaining the nature of the general intel¬ 
ligence factor, g. Arthur Jensen has said that 
g is “a source of variance in performance 
associated with individual differences in the 
speed or efficiency of the neural processes 
that affect the kinds of behavior called men¬ 
tal abilities.” 57 This leaves open the ques¬ 
tions of “what neural processes?” and “what 
behaviors?” for different cognitive actions 
are supported by different brain systems. 

Behaviorally, g is virtually synonymous 
with general reasoning ability, which in turn 
is synonymous with individual differences in 
working memory. By this argument the seat 
of g is in the frontal-parietal-cingulate cor¬ 
tex system. However, this raises some prob¬ 
lems. Vocabulary tests are highly g loaded, 
but tests involving syntactical and seman¬ 
tic analyses of single words do not activate 
the entire frontal-parietal-cingulate system, 
and do activate areas outside of this system, 
notably in the temporal lobe. 58 The source of 
g seems to jump around as the task changes. 
Why? 


57 Jensen, 1998, p. 74. 

58 See Figure 7.5. For a typical study, see Friederici, 
Opitz, & von Croman, 2000. 
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Figure 7.9. Hypothetical relationships between brain systems, 
narrowly defined information-processing functions, working 
memory, and general reasoning ability (g ). This diagram is not 
proposed as a model, but rather to show the complexity of the 
issue, and the need to deal with broadly defined abilities, such as g, 
as emerging from a system of interacting components, rather than 
being a thing in itself. 


7.7.2. What More Do We Need to Know 
(and What Has Not Been Shonm)? 

Investigations of the neural basis of intel¬ 
ligence are often more compelling than 
behavioral studies. The chance to look at the 
brains of intelligent and not-so-intelligent 
people seems to many to be far more excit¬ 
ing than poring through analyses of cor¬ 
relation matrices. There certainly is more 
to the neuroscience approach than just an 
increase in excitement. The imaging and 
enhanced electrophysiological technologies 
available today provide researchers on intel¬ 
ligence with a new source of data. They can 
rightly behave like nineteenth-century gold 
miners, who rushed from California to Aus¬ 
tralia to the Klondike in an effort to strike 
it rich in new fields. While a great many 
nuggets of information have been mined, 
there is no reason to believe that the neu¬ 
roscience gold field is about to pan out. It 
does have its limitations. 

Clearly overall brain size is correlated 
with intelligence test scores. The correla¬ 
tion appears to be in the .3 range, which 
is not everything, but is not to be dismissed. 
I doubt that there will be any great effort 
to develop this finding any more, because 
of the findings on the differential impor¬ 


tance of various regions of the brain. These 
findings, and the multivariate nature of 
intelligence itself, make it clear that looking 
at something as gross as the relation between 
overall brain size and an omnibus intelli¬ 
gence score, such as the IQs derived from 
batteries of subtests, as in the WAIS, has 
gone about as far as it can go. 

This chapter opened with an analogy 
between the way the brain produces thought 
and the way an orchestra produces music. 
The cognitive orchestra contains several 
functional players: working and long-term 
memory, verbal reasoning, visual-spatial rea¬ 
soning, orientation ability, and numerous 
subdivisions within each of these functions. 
We now have a reasonably good idea where 
each of these players sits in the brain, 
although there are certainly details to fill in. 
We also know that intelligence is associated 
with slight structural changes and a marked 
shift in the efficiency with which the brain 
utilizes the players. 

What we do not know is how the dif¬ 
ferent players in the brain work. Most cru¬ 
cially, we do not know how the conductor 
works. It is certainly an advance in knowl¬ 
edge to move from saying “intelligence is 
associated with big brains” to talking about 
the locus of reasoning and executive control 
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functions in the frontal, parietal, and cin¬ 
gulate gyrus. Nevertheless, such statements 
are maps of where an activity is, not expla¬ 
nations of what it is. It is one thing to say 
that the conductor is the person standing on 
the podium at the front of the orchestra. It 
is another to say that the conductor main¬ 
tains the tempo and signals emphasis for the 
various orchestral sections. It is still another 
thing to say how the conductor maintains 
tempo, and so forth. We cannot stop by say¬ 
ing there is a conductor in, say, the system 
specified by the P-FIT model. That is like 
explaining the conductor by pointing to the 
podium. We have to explain how the var¬ 
ious pieces of brain anatomy achieve their 
functions. This remains a mystery. 

The second mystery has to do with the 
flexibility of the brain. To what extent is the 
brain's development under the control of a 
genetic program? How, and how much of 
this development can be altered by expe¬ 
rience? Suppose we grant, for the minute, 
that the unusual size of Einstein’s parietal 


cortex was associated with his undoubtedly 
superior mathematical talent. Was Einstein 
able to do mathematics because his pari¬ 
etal cortex was large? Or did Einstein’s pari¬ 
etal cortex become large because he spent 
a lifetime thinking mathematical thoughts? 
On a more prosaic level, we know that 
experienced London taxicab drivers have 
large hippocampi and a superior sense of 
orientation along London’s streets. Is this 
because their hippocampi enlarged as they 
acquired experience with London, or is it 
because novice taxicab drivers with smaller 
hippocampi could not develop the necessary 
cognitive map, and had to find new employ¬ 
ment? We do not know. 

Humans are provided, at conception, 
with the potential for a brain. The poten¬ 
tial varies among individuals. Both in utero 
and throughout their life individuals are 
exposed to physical and social environments 
that affect the development of the brain. In 
the next two chapters we consider how that 
development takes place. 


CHAPTER 8 


The Genetic Basis of Intelligence 


Darwinian man, although well behaved 
At best is a monkey shaved. 

W. S. Gilbert 


8.1. Introduction 

The epigraph, taken from the 1909 Gilbert 
and Sullivan operetta Princess Ida , almost 
gets the scientific facts right. In the interest 
of scientific accuracy the lines should read 

Darwinian man, though well behaved 

Is only an ape who's shaved. 

for there are some key genetic differences 
between apes and monkeys. We and our 
other ape cousins have lost those ele¬ 
gant monkey tails, developed a different 
limb structure, and have bigger brains. The 
ape-monkey genetic changes took place 
twenty million years ago; five millions years 
ago more genetic changes produced the 
hominids. Somewhere around 100,000 years 
ago our own species, Homo sapiens, a very 
big-brained species with a limb structure 
that is fine for walking but terrible for 


climbing trees, began to spread from our 
place of origin in Africa, and by somewhat 
less than 15,000 years ago (the date is still 
debated) had reached the southern tip of 
South America, thus populating all the con¬ 
tinents except Antarctica. 

Genetic change did not stop as humans 
wandered across the globe. A very large 
majority of the genes that define humanity 
are shared by all humans, but there are 
significant genetic differences between pop¬ 
ulations. Swedes, Tanzanians, Chinese, and 
the Quechua Amerindians who live in the 
Andes differ in their appearance - within the 
range of human differences - and they differ 
in other ways as well. These include sensi¬ 
tivity to sunlight, the ability to metabolize 
milk and milk products, and susceptibility 
to a number of diseases. So it is reasonable to 
believe that there may be genetic differences 
in intelligence as well, and in Chapter 11 we 
will review the evidence that there are. 

But, and it is a very big but, one of the rea¬ 
sons that humans were able to settle all over 
the globe, in very different ecological sys¬ 
tems, is that their big brains and enhanced 
capabilities for learning allowed them to 
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develop different cultures, to adapt to differ¬ 
ent environments. Then these cultures took 
on a life of their own, as they were socially 
transmitted from generation to generation. 
So it is reasonable to believe that there may 
be sociocultural differences in intelligence. 
And there are. 

Theodosious Dobzhansky, one of the 
great geneticists of the twentieth century, 
said 

Nothing in biology makes sense except in 

the light of evolution. 

Theodosius Dobzhansky 
(The American Biology Teacher, 
March 1973, p. 129) 

He was right, but when it comes to 
human intelligence we have to consider both 
biological and social evolution. Is intelli¬ 
gence something you inherit, genetically, 
or acquire through experience? The short 
answer is “both.” The long answer is a lot 
more complicated. 

The bad news is clear. There are genet¬ 
ically determined conditions that virtually 
guarantee mental disability. Mental disabil¬ 
ity can also be produced by environmental 
conditions, notably any event that results in 
brain damage. In other cases genetic config¬ 
urations raise the risk of mental disability, 
but the degree of disability depends upon 
environmental conditions. 

The good news is not so clear. Lots of 
things influence intelligence in the normal 
range, roughly from IQ 70 upward. Test 
scores are not as accurate in predicting cog¬ 
nitive competence, intelligence in the con¬ 
ceptual sense, in the upper range as they 
are in predicting mental deficiencies. Many 
mental disabilities are caused by one or a few 
genetic anomalies, or by catastrophic envi¬ 
ronmental events that produce brain dam¬ 
age. Variations in intelligence within the 
normal range are seldom caused by single 
events. They result from the cumulative 
effect of many genetic and environmental 
factors, with each factor having only a small 
effect. Therefore, the effects may be evident 
only on a population basis, and can be doc¬ 
umented only by statistical analyses. 


This chapter focuses on the genetic basis 
of intelligence. The next discusses environ¬ 
mental factors. No smoking guns will be 
revealed, for there are none. On a popula¬ 
tion basis, the effects of both genetic inher¬ 
itance and environmental circumstances are 
measurable. Except in the case of men¬ 
tal disability, assigning a particular person's 
intelligence to genetics or environment is 
virtually impossible. 

The chapter is divided into four sec¬ 
tions. The first is a quick overview of basic 
genetic theory. The second discusses quan¬ 
titative behavior genetics, a discipline that 
is concerned with the extent to which var¬ 
ious human traits are inherited. We then 
look at molecular genetics, which deals with 
the mechanisms of genetic inheritance. The 
chapter closes with a summarization, and a 
discussion of some of the controversies that 
have surrounded studies of intelligence and 
genetics. 

8.2. A Quick Introduction to Genetics 

This section is a brief introduction to basic 
concepts in genetics. Hopefully it will pro¬ 
vide an adequate introduction for readers 
who have not studied genetics, and a useful 
refresher for those who had a course some 
time ago. 

The genetic model of inheritance was 
established in the mid nineteenth century 
by the Austrian monk Gregor Mendel (1822- 
1884). Today it seems that a monk would 
be an unlikely person to make a scien¬ 
tific contribution, but in Mendel's time a 
monk with a scientific career was not that 
unusual. Mendel spent a considerable por¬ 
tion of his professional life at the University 
of Vienna, and lectured at other institutions 
as well. 

Mendel studied the inheritance of eas¬ 
ily recognizable traits in plants, such as the 
color of pea plants. He discovered that a 
plant had a genetic potential, derived from 
its parent plants, for producing a trait, such 
as leaf color or form. Mendel established 
that 
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(a) The genetic potential of both parents is 
passed on to the offspring. 

(b) Traits fall into two types. A dominant 
trait will be expressed in an offspring 
if it has inherited the potential for the 
trait from either parent. A recessive trait 
is expressed only if it is inherited from 
both parents. 

(c) Therefore, we have to distinguish 
between the observable characteristics 
of an individual, the phenotype, and the 
genetic configuration, the genotype. 

In modern terms, Mendel had inferred 
the existence of a gene. He also concluded 
that a gene could have two or more different 
forms, now called alleles, that might cause 
a characteristic to be expressed in different 
ways. For instance, a pea’s pod cover can be 
either rough or smooth. 

Imagine a plant that produces either a 
white or a red flower, and that the white 
color is dominant. Write C for the allele 
that produces white, and c for the allele 
that produces red. Furthermore, let the 
first term written be the gene inherited 
from the “alpha” parent (which in plant 
crossings is arbitrary) and the second the 
gene inherited from the “beta” parent. We 
have the following possibilities, using the 
terms homozygous to refer to an individual 
whose genetic makeup consists of two 
genes with the same allele, and heterozygous 
to refer to an individual whose genotype 
contains two different alleles. 


Genotype 

Phenotype 

Term 

cc 

White 

Homozygous 

Cc 

White 

Heterozygous 

cC 

White 

Heterozygous 

cc 

Red 

Homozygous 


Ignoring whether an allele came from the 
alpha or beta parent, there are three possi¬ 
ble genotypes, CC, Cc, and cc, but only two 
possible phenotypes, white and red. The 
genotype implies the phenotype; if we know 
the genotype of a plant, we know the phe¬ 
notype. The converse is not always true, for 


the phenotype “white flower” could be pro¬ 
duced by either a CC or a cC genotype. 

In simple cases genotypes can be inferred 
by combining the information from pheno¬ 
types of two or more generations of related 
individuals. This is called a pedigree study. In 
the example, a plant breeder could conduct 
a pedigree study by reasoning as follows: 

' 7/1 cross red with red, I will always get 
red." 

“Suppose I cross red with white and, 
unknown to me, the white flower has geno¬ 
type CC. All offspring will have genotype 
Cc, and thus I will get only white in the 
offspring generation. Let's call this batch of 
flowers 2A." 

This situation is shown in Figure 8.1(a). 

“But suppose that the white flower has 
genotype Cc. Then all the next genera¬ 
tion will inherit a c from one parent and, 
equally likely, either a C or a c from the 
other parent. Call this batch of flowers zB. 

In the zB population all offspring will have 
either genotype Cc or cc, and I expect to 
have an equal number of them ." 

The resulting distribution of genotypes 


and phenotypes will be 


Genotype 

Probability 

Phenotype 

Cc 

1 

2 

White 

cc 

1 

2 

Red. 

The situation 

is shown in 

Figure 8.1(b) 


The frequencies are statistical expecta¬ 
tions, rather than determined values. There¬ 
fore, the breeder cannot expect batch 2B 
to be made up of an exactly equal num¬ 
ber of red and white flowers. However, if 
just one red is seen, that is evidence that the 
white flower in the first generation could not 
have been CC. Therefore, the breeder can 
be “statistically certain” that the first genera¬ 
tion white was CC only if the second gener¬ 
ation is large enough that the probability of 
not generating a red from a first generation 
Cc white is very low. 
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(a) 


(b) 



Figure 8.1. A simple illustration of Mendelian genetics. The 
situations shown are the “pedigrees” that might be derived from 
hybridization between two plants bearing a gene that has either a 
"white” allele (C), which results in white flowers, or a "red” allele 
(c), which results in red flowers. The upper figure, (a), shows the 
results that would be expected if a white plant with dominant 
genes only (phenotype = white, genotype = CC) is crossed with a 
plant carrying recessive genes only (phenotype = red, genotype = 
cc). The lower figure shows a similar pedigree assuming that the 
white “parent” had genotype Cc (heterozygous). 


Pedigree studies provide a way of infer¬ 
ring genotypes from phenotypes, based on 
the pattern of inheritance. The idea works 
very well in plants, in situations where the 
expression of a phenotype is completely dic¬ 
tated by the genotype. However, it may 
take several generations before the pattern 
becomes apparent. This is no problem if we 
are dealing with plants that have discrete 
phenotypes, such as color or roughness, and 
where we know the ancestory of each indi¬ 
vidual. When it comes to people and intelli¬ 
gence, neither condition holds. We are deal¬ 
ing with a continuous trait, so the notion 
of discrete phenotypes is blurred. We are 
dealing with a relatively slow-breeding pop¬ 
ulation; a human generation is a little over 
thirty years long. In addition, as will be illus¬ 
trated, some traits may not reveal them¬ 


selves until adulthood or even old age. We 
certainly cannot control mating, so we have 
no way of creating individuals with informa¬ 
tive pedigrees, as a plant or animal breeder 
can. Therefore, while pedigree studies are 
sometimes useful in human genetics, it is 
more often the case that we have to look at 
statistics across populations. That turns out 
to be an advantage, rather than a defect, for 
such studies link genetics to evolution. 

8.2.1. Mendel's Model Applied 
to Populations 

In the 1930s Theodosius Dobzhansky (1900- 
1975), a Russian-born biologist who immi¬ 
grated to the United States in 1927, devel¬ 
oped techniques for studying genetics at 
the population level, without regard for 
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individual pedigrees. 1 In addition to this 
insight, Dobzhansky's work provided the 
essential link between genetics and evolu¬ 
tion. This section will consider only simple 
cases. Further complications are discussed in 
subsequent sections. 

Dobzhansky worked with the common 
fruit fly, Drosophila melanogaster. From a 
genetic view, Drosophilia has three attrac¬ 
tive characteristics: it is cheap; it breeds 
rapidly; and it has a number of traits that 
are controlled by single genes with only two 
alleles, as in the examples mentioned ear¬ 
lier. Dobzhansky used Mendel’s principles 
to develop mathematical models of the dis¬ 
tribution of traits in successive generations 
of a population of fruit flies. 

To illustrate, let W and w be the alleles of 
the gene controlling the fly's body color. The 
normal (“wild type”) body color in fruit flies 
is grayish, but there are shiny black (ebony) 
colored fruit flies. Imagine that a laboratory 
has available two large populations, one of 
wild type flies and another of ebony flies. 
At this point we do not know the genetic 
makeup of either population, because we 
do not know if a heterozygous (Ww) fly is 
grayish or ebony. 

We then conduct an experiment in which 
we create a mixed population, containing 
70% wild type flies and 30% ebony flies. In 
flies it is reasonable to assume random mat¬ 
ing. The following combinations of parent¬ 
ages can be expected 

Wild with wild: .70 x .70 — .49, just under 
half of the flies in the next generation. 

Wild male with ebony female: .70 x .30 = 

.21, just over one-fifth of the flies. 

Wild female with ebony male: .30 x .70 = 

.21, just over one-fifth of the flies. 

Ebony male with ebony female: .30 x .30 = 

.09, just under one-tenth of the flies. 

Now suppose that, unknown to us, all 
wild flies have genotype WW, all ebony flies 
have genotype ww, and W is dominant. If 

1 Dobzhansky’s major work, Evolution and the Origin 
of the Species, was first published in 1937. Two later 
revisions were published, the last in 1951. 


these assumptions are correct, 91% (49% + 
21% -F 21% = 91%) of the flies in the offspring 
generation should have the gray, wild type 
phenotype, and 9% should have the ebony 
phenotype. The mathematics is a bit more 
complicated for the second generation - the 
offspring of the offspring - but the same 
principle applies. If we start with a mix of 
WW and ww populations, then the propor¬ 
tion of different genotypes, and by implica¬ 
tion the proportion of phenotypes, is deter¬ 
mined for all future generations. Statistical 
techniques (trust me!) can be used to deter¬ 
mine the most likely starting configuration, 
given the distribution of phenotypes in later 
generations. 

Similar models can be applied for genes 
with more than two alleles and for situa¬ 
tions in which heterozygous genotypes (Cc 
or Ww in the examples) display a phenotype 
that is a mix between the two homozygous 
forms. Although the models are more com¬ 
plicated than the example, the principles are 
the same. 

In order to link genetics to evolution we 
need the concept of selective pressure. This 
can be illustrated by continuing the exam¬ 
ple, including the assumption about the par¬ 
ent population containing 70% WW and 30% 
ww genotypes. The distribution of fertilized 
fly eggs, not mature flies, will have the fol¬ 
lowing distribution of genotypes: 


Phenotype 

Genotype 

Fraction 

Wild 

WW 

•49 

Wild 

Ww 

.42 

Ebony 

ww 

.09 


It is not immediately obvious, but can 
be proven, that if there is random mating 
within a population, the relative frequen¬ 
cies of the genotypes will be maintained. 
This is called Hardy-Weinberg equilibrium. 
In our example, if all three phenotypes 
were equally likely to survive to reproduc¬ 
tive maturity, the fractions would be main¬ 
tained in all subsequent generations. If we 
were to look ten generations ahead, the frac¬ 
tions would still be (.49, .42, .09). 
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This is where evolution comes in. Sup¬ 
pose that not all phenotypes are likely to 
survive at all, or to produce an equal number 
of offspring. In the example, it might be that 
ebony-type flies are easier to see, and thus 
more subject to predation, than the wild- 
type flies. The mechanism of selection does 
not matter; the numbers do. 

Suppose that the gray-bodied pheno¬ 
types, all flies with the wild appearance, 
have a 90% chance of reaching sexual matu¬ 
rity (and hence reproducing themselves), 
while animals with an ebony-body pheno¬ 
type have only an 80% chance. This is called 
a selection pressure; some genotypes are more 
likely to reproduce successfully than are 
other genotypes. Selective pressure will dis¬ 
tort the population frequencies in the breed¬ 
ing population of the next generation. In this 
example, the distribution of genotypes in 
the breeding population of the first, second, 
and third generations would be: 


Generation 

WW 

Ww 

ww 

1 

•495 

.424 

.081 

2 

• 5°5 

.418 

■°77 

3 

• 5*5 

• 4 J 3 

.073 

10 

•574 

• 37 2 

.054. 


There is a drift toward domination by 
more viable phenotypes (genotypes WW 
and Ww), which comprise just under .92 
of the genotypes in generation 1, rising to 
.946 after ten generations. After 100 genera¬ 
tions the frequencies would be WW = .851, 
Ww = .143, ww = .005, with a .001 rounding 
error. The evolutionary pressure favoring a 
particular phenotype exerts a pressure that 
changes the genotypical distribution in the 
population. However, the unfavorable ww 
geneotype never quite dies out. Why not? 
The w alleles hide out, in the Ww geno¬ 
type, which has a nonpathological pheno¬ 
type. The ww form can reoccur when two 
Ww genotypes mate. Such matings account 
for 99% of the ww (pathological) genotypes 
at fifty generations. 


Let’s tie this back to human intelligence. 
The .9 figure for producing an offspring who 
also lives to reproductive age is far more 
characteristic of human populations than 
of fly populations. 2 As the example shows, 
selective pressures operate to reduce the fre¬ 
quency of genes associated with phenotypes 
that have low reproductive rates. However, 
it may take a long time for the less favorable 
genotypes to disappear from the population. 
This is illustrated in Figure 8.2, in which 
the example has been changed so that the 
two phenotypes have reproductive rates of 
.9 and .5. The figure shows the resulting fre¬ 
quencies of genotypes WW, Ww, and ww, 
extended over fifty generations. We could 
conduct the experiment in flies in about 
three months. An analogous experiment in 
humans would take over 1,500 years. 

While one should always treat hypothet¬ 
ical examples with reservation, it is interest¬ 
ing to consider what this implies in social 
terms. Typically people know their pheno¬ 
type, but not their genotype. Imagine a man 
and a woman, both of whom are of genotype 
Ww. As population figures approach stabil¬ 
ity, six out of every thousand couples would 
fall into this category. Neither would have 
any reason to suspect that they were carri¬ 
ers of the w allele, yet there would be one 
chance out of four that their child would 
be a ww, and hence exhibit the pathological 
phenotype. 

Nevertheless, evolution has made things 
better. At the start of our hypothetical evo¬ 
lutionary cycle approximately one in six 
couples would have been in this situation. 
That is a lot worse than six out of a thou¬ 
sand in the evolved population. 

Could situations like this actually occur 
in modern human populations? They can, 
and they do. Panel 8.1 presents an example, 
Huntington's Disease, and describes a famous 
case. 


2 The replacement rate is the average number of off¬ 
spring per woman required to maintain a stable 
population. In developed countries this is approxi¬ 
mately 2.2, which implies that the probability that 
an individual offspring will live to reproduce is .909. 
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Figure 8.2. The effect of evolutionary selection on the distribution 
of genotypes. The relative frequency of three genotypes (ordinate) 
is plotted as a function of the number of generations (abscissa). The 
probability that a WW or Ww genotype will live to reproduce is 
assumed to be .9. The probability that a ww genotype (leading to a 
pathological phenotype) will live to reproduce has been set at .5. 
Initially the ww genotype had a frequency of about .05 (one in 
twenty cases in the population). By ten generations it had dropped 
to .01 (1 in 100). The change from thirty generations (.002) to fifty 
generations (.001, 1 in 1,000) was very small. 


8.2.2. On beyond Mendel: Complications 
when Dealing with Continuous Traits 
(like Intelligence) 

The single-gene model does a good job of 
explaining the inheritance of traits like the 
colors and surfaces of pea pods, eye and body 
color in fruit flies, and a surprising number 
of severe pathologies of human intelligence, 
such as Huntington’s disease. Things rapidly 
become more complicated for continuous 
traits, and for traits that are influenced by 
more than one gene. Mendel thought of 
some of these complications; others he had 
no way of anticipating. 

Multiple alleles. A gene can have more 
than two alleles. There is an important 
case, Alzheimer's dementia, in which this is a 
factor. 

Additive genetic potentiab for continuous 
traits. Alleles do not always fall neatly into 
“dominant” and “recessive” categories. The 
heterozygous form may have a phenotype 
that is intermediate between the phenotypes 
of the two homozygous forms. As a hypo¬ 


thetical example, suppose that the height 
of a plant is determined by a single gene, 
with two alleles. Represent these by A and 
a. If the heterozygous (Aa) genotype has a 
potential that is an additive mixture of the 
two phenotypes, then we might have the 
following situation: 


Genotype 

Phenotype 

Expected Height 

AA 

HIGH 

20 cm 

Aa 

MEDIUM 

15 cm 

Aa 

LOW 

10 cm 


Environmental factors complicate the obser¬ 
vation of genetic and phenotypic variation. A 
genotype is associated with the potential for 
expression of a trait. Environmental factors 
may cause the observed value to vary around 
the expected value. In the height example, 
suppose that the frequencies of the A and a 
forms of the gene were both . 5 and that envi¬ 
ronmental factors produce deviations from 
the genetic expectation that are normally 
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Panel 8.1. Huntington's Disease: 
Woody Guthrie and His Family 

Huntington’s disease: Huntingon's dis¬ 
ease is a severe form of mental and 
physical disability with an incidence of 
approximately 1 case in 30,000 people. 
Symptoms usually appear somewhere 
between thirty-five and fifty years of 
age, although symptoms in adolescence 
are not unknown. The initial symptoms 
are loss of short-term memory, then 
tremors and progressive failure of cogni¬ 
tion. Abnormal irritation and aggressive 
behaviors have also been reported. Loss 
of muscular control follows, and then 
death. 

The disease is caused by a dominant 
allele, so affected individuals need have 
only one copy of the gene. If a child of 
a Huntington's disease patient carries the 
allele, he or she is at risk of developing 
symptoms earlier than the parent did. 

The singer Woodrow Wilson (Woody) 
Guthrie (1912-1965) was a prototypical 
example of the inheritance of Hunting¬ 
ton’s disease. 

Guthrie was born to poor parents in 
Oklahoma shortly before World War I. 


His mother, who died in an Oklahoma 
mental institution, is now believed to 
have suffered from Huntington’s disease. 
Guthrie's early adult life was marked by 
spectacular success as a folk and social 
protest singer in the 1930-40 depression 
era. His “This Land is Your Land” is con¬ 
sidered a classic of the genre. In mid¬ 
dle age Guthrie developed severe mem¬ 
ory loss, progressive dementia, and other 
signs of deterioration typical of the syn¬ 
drome. He died of the disease at age fifty- 
five. 

Guthrie fathered eight children. Three 
of them died (two in auto accidents) 
at age twenty-one or younger, earlier 
than one would expect the disease to 
express itself. Therefore, we do not know 
if they were carriers of the gene for 
Huntington’s disease. Two of the surviv¬ 
ing children developed Huntington’s dis¬ 
ease. Both died at forty-one, younger than 
their father at the time of his death. The 
remaining three, including the contem¬ 
porary folk singer Arlo Guthrie (1947-), 
are apparently not carriers.* 

* Information from a website supporting a Public 

Broadcasting System tribute to Woody Guthrie, 

Wikipedia, and several biographical articles. 


distributed with a mean of zero and a stan¬ 
dard deviation of 2 cm. Figure 8.3(a) shows 
the distribution of heights that would result 
for each genotype. Figure 8.3(b) shows the 
distribution of phenotypes, that is, the dis¬ 
tribution of observed height. This distribu¬ 
tion was obtained by summing the distri¬ 
butions for each genotype, weighted by the 
probability of the genotype’s occurrence. 

Selection pressure complicates the situa¬ 
tion. In the example of Figure 8.3, I assumed 
that there was no selection pressure. As the 
argument that led to Figure 8.2 showed, 
differences in probabilities of reproduction 
due to selection pressure will affect both 
genotype and phenotype frequencies over 
time. In the simplest case, which was illus¬ 
trated in Figure 8.2, there is a “favorable” and 


an “unfavorable” allele, and the heterozy¬ 
gous genotype produces the phenotype, and 
therefore the reproductive success, associ¬ 
ated with the dominant allele. There are 
cases in which the phenotype associated 
with the heterozygous genotype may have 
a reproductive success rate above, interme¬ 
diate to, or below the success rates of the 
phenotypes associated with either homozy¬ 
gous genotype. Panel 8.2 describes one such 
case. 

Dominance effects. Dominance refers to 
the extent to which one of the alleles in 
a heterozygotic pair dominates the other 
allele. Introductory examples of Mendel’s 
laws usually present examples of complete 
dominance, in which the phenotype for the 
heterozygotic (Aa) form is the phenotype 
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-Aa 

-AA 



Frequency 


(b) 


Figure 8.3. The distribution of phenotypes with independent 
environmental effects. This example shows the effects expected 
with three different genotypes, aa ; Aa, and AA, with uniform 
environmental effects that are independent of the genotypical 
potential. See text for details. Figure [a] shows the conditional 
distribution of phenotypical values, given the genotype. Figure (b) 
shows the overall distribution of the phenotypical values, given 
equal frequency of the two alleles. 


of the dominant [AA] allele, thus mak¬ 
ing a sharp distinction between dominant 
and recessive alleles. When the phenotypical 
trait is continuous, as in the case of intelli¬ 
gence, the heterozygotic phenotype can be a 
compromise between the phenotypes of the 
two homozygotic genotypes. That was the 
case for the hypothetical plant height illus¬ 
tration given earlier. In that illustration the 
heterozygotic phenotypical height was mid¬ 
way between the height associated with the 
two dominant genotypes. This is not nec¬ 
essarily the case. For instance, in the plant 
height example the Aa genotype might have 


been associated with a genotypical height 
of 18 cm, closer to the 20 cm phenotypical 
height of the AA form. 

In statistical analyses dominance effects 
appear as interactions, for the phenotypic 
effects of the two forms of the allele are not 
additive. 

Probabilistic display of a phenotypical char¬ 
acteristic. In some cases genetic makeup 
influences the probability of displaying a 
trait, rather than the form that the trait has. 
This is particularly true of behavioral traits 
that depend upon interactions between 
the individual and the environment. 
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Panel 8.2. Sickle-Cell Anemia, 
Malaria, and the HBB Gene 

The HBB gene influences the develop¬ 
ment ol hemoglobin in the blood. Let H 
be the dominant form and h the recessive 
form. People who have the hh genotype 
suffer from sickle-cell anemia . Untreated, 
people with sickle-cell anemia usually die 
in their twenties. Treated patients now 
live into their thirties and forties. 

People who are heterozygous (Hh) for 
the HBB gene have sickle-cell trait. The 
symptoms of sickle-cell anemia are gen¬ 
erally not present. However, 25% of the 
children of parents with sickle-cell trait 
will be homozygous (hh), and therefore 
will have sickle-cell anemia. Since there 
is a good chance that a child with sickle¬ 
cell anemia will not live to reproductive 
age, especially if not treated, the repro¬ 
ductive success of the heterozygous par¬ 
ents is reduced. 


To make things more complicated, 
the h allele confers some protection 
against malaria, a mosquito-borne dis¬ 
ease. Worldwide, malaria is a major dis¬ 
ease, especially in developing countries. 
There are 355 million to 500 million cases 
and about one million deaths each year, 
mostly when young children contract the 
disease. The disease has been virtually 
eradicated in developed countries, largely 
due to aggressive measures to control the 
mosquito population. 

If malaria is prevalent, the order of 
reproductive success for the three geno¬ 
types is Hh, HH, hh. If malaria is not 
present, however, the order of reproduc¬ 
tive success is HH, Hh, hh.* 

* Information on sickle-cell anemia retrieved from 
www. n 1 m. ni h. gov/ m edl in epl us/ency/article/ 
0001527.htm. Information on malaria retrieved 
fromwww.nlm.nih.gov/medlineplus/ency/artide/ 
000621.htm, August z8, 2008. 


Alcoholism, smoking, and other addictions 
illustrate such behaviors. People whose rel¬ 
atives are addicts are not necessarily slightly 
addicted, but the probability that they will 
become addicts is increased. 

Genes are packaged on chromosomes. 
Genes are inherited in packages, rather than 
being inherited individually. Every cell in 
the body of every living creature contains 
structures called chromosomes, which bear 
the genes for that individual. The chromo¬ 
somes come in two types, the autosomes and 
the sex chromosomes. In mammals the auto¬ 
somes are paired, so that each member of a 
pair contains one of the alleles of a gene pair. 
The number of chromosome pairs varies 
with the species. Humans have twenty-two 
autosomes, referred to somewhat unimag¬ 
inatively as chromosomes 1-22. An individ¬ 
ual inherits one chromosome in each pair 
from the mother and the other from the 
father. 

There are two types of sex chromo¬ 
somes, X and Y. All normal human females 


have two X chromosomes, one inherited 
from each parent. Genetic transmission for 
females is thus the same on all twenty- 
three chromosome pairs. All normal males 
have an X chromosome inherited from the 
mother and a Y chromosome inherited from 
the father. The two chromosomes contain 
different genes. Probabilities of inheritance 
have to be modified accordingly. This is 
shown in the case described in panel 8.3. 

We now turn to the case of inheritance 
controlled by several genes, the polygenetic 
situation. 

8.2.3. The Effect of Multiple Genes 
(the Polygenic Model) 

Suppose that there are two independently 
distributed genes, A and B, each with two 
equally likely alleles (A,a and B,b), and let 
the favorable alleles (A or B) each add five 
IQ points above the mean of 100 and the 
unfavorable alleles (a or b) each subtract 
five IQ points from the mean. There are 
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Panel 8.3. Male Pattern Baldness 

Male pattern baldness (loss of hair 
spreading from just above the temples) 
is a sex-linked genetic trait. Male pattern 
baldness is not itself a cognitive trait, nor 
has it been linked to any variation in intel¬ 
ligence. The reason for discussing it here 
is that it is largely under the control of 
genes on the X chromosome. This also 
appears to be the case for many cognitive 
deficiencies. The ancestral relationships 
in male pattern baldness provide an easily 
observed analog to ancestral relationships 
that may be involved in the inheritance 
of some aspects of intelligence. Imag¬ 
ine a young man concerned over future 
hair loss. His father is not bald, but his 
two grandfathers both display male pat¬ 
tern baldness. Should the young man be 
worried? 

Yes. The risk for male pattern bald¬ 
ness depends on an allele of the AR gene, 
located on the X chromosome.* Because 
the young man inherited a Y chromo¬ 
some, but not an X chromosome, from 
his father, the father’s hair pattern is irrel¬ 
evant. By the same token, the father’s 
father (the young man’s paternal grand¬ 
father) is also irrelevant. 

The maternal grandfather is another 
matter. The young man inherited his X 
chromosome from his mother. Mother 


had two X chromosomes, one from the 
man’s maternal grandfather and one from 
the maternal grandmother. Because the 
maternal grandfather has male pattern 
baldness, the maternal grandfather prob¬ 
ably carries the AR allele on his X chro¬ 
mosome. (There are other causes of bald¬ 
ness, but we disregard them for sim¬ 
plicity.) There is a 50% chance that 
the maternal grandfather’s gene was car¬ 
ried forward to the young man, via his 
mother. Thus the probability is at least .5 
that the young man carries the AR allele, 
putting him at risk for baldness. 

Why do I say “at least"? There is also 
a chance that the AR allele was carried 
on the X chromosome that his mother 
inherited from her mother (the maternal 
grandmother). More generally, if there is 
a man with male pattern baldness some¬ 
where in the maternal line beyond the 
grandparents, there is a chance that the 
baldness genotype will be carried for¬ 
ward, unexpressed, through the females 
of that line, until it reaches a male. The 
chances of this happening grow smaller 
and smaller with each generation, but 
the chance of inherited baldness never 
vanishes. 

* Information on male pattern baldness retrieved 
from the National Institute of Health website 
ghr.nlm.nih.gov, March 29, 2008. 


now nine possible genotypes with associated 
probabilities and phenotypical intelligence, 
which for simplicity I will denote IQ. The 
values are shown in Table 8.1. 

Several of the genotypes in Table 8.1 pro¬ 
duce the same phenotype, the same poten¬ 
tial for an IQ score. Figure 8.4(a) rearranges 
the data to show the probability of dif¬ 
ferent IQ potentials (phenotypes) regard¬ 
less of the genotype. Figure 8.4(b) does the 
same thing for a more complicated case in 
which, instead of dealing with two genes, 
each of which contributes zb ten IQ points, 
we consider a situation in which there are 


ten genes, each with two alleles, and each 
allele either adds or subtracts .3 points of 
potential from the mean of 100. As the 
number of genes increases, the distribu¬ 
tion of IQ potentials approaches the normal 
distribution. 

8.2.4. A Simple Polygenic Model 
of Intelligence 

Suppose that genetic and environmen¬ 
tal effects are additive and independent. 
The phenotype, the observable score on a 
test, should be predictable by combining 
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Table 8.1. Genotypes and associated IQ potentials for the (artificially simple) case of two 
genes affecting intelligence. The assumptions are that each gene has two equally probable 
alleles, that a favorable allele contributes 5 points above a mean potential of 100, and that 
an unfavorable allele contributes 5 points below the mean potential. The genes are 
assumed to be independently distributed. 

Genotype aa bb aa Bb Aa bb aa BB Aa Bb AA bb Aa BB AA Bb AA BB 


Probability 0.0625 0.1250 0.1250 0.0625 0.2500 0.0625 0.1250 0.1250 0.0625 

Expected IQ 80 90 90 100 100 100 110 110 120 


genetic and environmental influences, in the 
equation 

X=G+E+e, (8.1) 

where X is the score, G and E are the 
genetic and environmental contributions to 


the score, and e is a residual term due to test 
unreliability. 

Equation 8.1 refers to statistical expecta¬ 
tions and predictions. No one ever achieved 
an IQ score because they had such-and- 
such genetic inheritance and environmen¬ 
tal experience. They achieved the score 
because of how they behaved on the test, 



0.2 



a Probability 


IQ Potential 

(b) 


Figure 8.4. Probabilities for IQ potential (ordinate) as a function of the 
value of the genotypic potential. Calculations are for a two-gene model 
(a) and a ten-gene model (b). See text for further details. 
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which was determined by the state of their 
brain (and hence mind, for the brain has a 
mind of its own) at the moment of testing. 
The brain came to be in the state it 
was because of the combined influences of 
genetic heritage and environmental experi¬ 
ence. These are distal causes of an IQ score. 
The behavior was the proximal cause. When 
we start to consider the relative influence 
of genetic and environmental influence on 
intelligence, the distal-proximal distinction 
becomes quite important. 

Equation 8.1 cannot be used to break 
down the score of any individual into 
genetic and environmental components, for 
we would be explaining one observable 
in terms of two unknowns. The equation 
does imply the distribution of scores (and 
hence correlations) across individuals of 
different degrees of genetic relatedness. 
Therefore, we can untangle genetic and 
environmental influences by examining vari¬ 
ations in IQ scores (or other variables) across 
individuals. 

The variation in a set of scores is mea¬ 
sured by the standard deviation; the larger 
the standard deviation, the more variable 
the trait underlying the score. The vari¬ 
ance is the square of the standard devia¬ 
tion. The covariance measures the extent to 
which two scores vary together. Equation 8.1 
implies that 


Var(X) = Var(G ) + Var(E) 

+ 2 Cov(G, E), (8.2) 


where Var indicates variance and Gov indi¬ 
cates covariance. The term on the left, 
[VarX), is the variation of the observed 
test scores. The first term on the right, 
( VarG ), measures the variation in the 
genetic makeup of the population; ( VarE ) is 
an index of the variation in the environment; 
and the third term, iCov[G, E), reflects the 
extent to which genetic and environmental 
contributions are correlated. 

The extent to which a trait is heritable 
is defined as the proportion of variance in 
the trait that is due to variance in genetic 


contribution. This is referred to as broad 
sense heritability, 


h 2 = 


h 2 = 


Var{G) 
1 /ar(X) 


( 8 - 3 ) 


Var(G) 

Var(G) + Var(E) + 2 Cov(G f E) 


This coefficient is the “statistic of choice” 
in many discussions of genetic influences on 
intelligence. 

Heritability is the proportion of phe¬ 
notypic variance associated with genotypic 
variance. It is important to remember (a) 
that variance is a measure defined on popu¬ 
lations, but not on individuals in that pop¬ 
ulation and (b) that variance is independent 
of the value of the expected (mean) value. 
These two conclusions also apply to h 2 . It is 
a useful statistic, if we keep certain restric¬ 
tions in mind. 

(a) Unless h 2 is zero or one we do not know 
the extent to which any individual's 
intelligence, or any other phenotypic 
trait, has been determined by heredity 
or environment. For instance, modern 
studies of hr in European and North 
American samples usually result in an 
estimated hr of from .5 to .8, depend¬ 
ing on certain characteristics of the sam¬ 
ple, such as age. It does not follow that 
there is any person in the sample whose 
personal intelligence is 50-80% due to 
genetic inheritance. 

(b) Because the variance is independent of 
the mean, an environmental variable 
can change the average level of a trait 
in that population without influencing 
the heritability. A good example is the 
height of Japanese-Americans. Height 
is a highly heritable trait. In addition, 
adult height is strongly dependent on 
the quality of infant nutrition. At the 
time of the major Japanese immigration 
to the United States, in the late nine¬ 
teenth century, infant nutrition in Japan 
was poor by modern standards. A cen¬ 
tury later Japanese-Americans who are 
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entirely descendants of the early immi¬ 
grants (i.e., who have no non-Japanese 
ancestors] are substantially taller than 
their forbearers, even though they come 
from the same gene pool. 

Could the same thing happen with 
intelligence? There are indications that 
it has. In the twentieth century there 
was a marked rise in the IQ scores of 
people in the developed nations. This 
change ; over only three human genera¬ 
tions; occurred far too quickly to have 
been due to genetic changes. 

(c) The heritability coefficient is influenced 
by three things: the extent of genetic 
variation in the population; the pene¬ 
trance, which is the extent to which the 
value of the phenotypic trait is deter¬ 
mined by the genome; and the extent 
of relevant environmental variation in 
the population. If a population con¬ 
sists of genetically diverse individuals; 
the heritability coefficient will go up; to 
the extent that the population consists 
of genetically homogeneous individuals; 
the heritability coefficient will go down. 
Conversely; if relevant environmental 
variation goes up; the heritability coef¬ 
ficient goes down; and vice versa. 

Point (c) is obvious when you inspect 
equation 8.3; because the term for genetic 
variance appears in both the numera¬ 
tor and the denominator; while the term 
for environmental variance appears only 
in the denominator. However; people do 
not always think mathematically; especially 
when the discussion of the genetics of intel¬ 
ligence becomes heated, so let us examine a 
few cases. 

In the extreme; an army of clones (as in 
the science fiction Star Wars films) would 
have no genetic variability; and hence the 
heritability coefficient would be zero. This 
extreme is never reached in human popu¬ 
lations; but different populations do vary a 
good deal in the extent of their genetic vari¬ 
ability. Sub-Saharan African populations 
display considerably greater genetic varia¬ 
tion than other populations. The genetic 
variability in sub-Saharan Africa should 


drive the heritability coefficient upward 
compared to genetic variability in other large 
populations. 

Small; isolated populations generally 
show low genetic variability. The inhabi¬ 
tants of Norfolk Island in the Pacific Ocean 
provide a historically interesting case. Vir¬ 
tually all the modern inhabitants of Nor¬ 
folk Island are descended from nine British 
sailors and twelve Polynesian women who 
fled to nearby Pitcairn Island after the 
famous Bounty mutiny in 1789. (Their 
descendents moved to richer; and then unin¬ 
habited; Norfolk Island in 1856.) We would 
expect the heritability coefficient to be low 
in the present-day Norfolk Islanders; due to 
low genetic variability. 

Heritability also depends upon the extent 
to which there is environmental variation 
relevant to intelligence. Increases in relevant 
environmental variation decrease heritabil¬ 
ity; decreases in relevant environmental vari¬ 
ation increase heritability. To illustrate the 
point I offer a thought experiment. 

I conjecture that h 2 in the population 
of England has gone up since the nine¬ 
teenth century, Gabon's time. Why? For 
simplicity; let us disregard immigration; 
and assume that all present English citi¬ 
zens are descended from the English pop¬ 
ulation of the nineteenth century. As a 
first approximation; assume that the vari¬ 
ance in the genetic contribution to intel¬ 
ligence; Var(G); is the same today as it 
was in the nineteenth century. As Dickens’s 
novels so graphically portray; in the nine¬ 
teenth century the economic; educational; 
and health differences between the rich and 
poor were stark. In the England of today rea¬ 
sonable health care; education; and nutrition 
are available to nearly everyone. In terms 
of equation 8.3; the variance in the envi¬ 
ronmental influence on intelligence; in the 
denominator of the equation; has decreased. 
Therefore; if the genetic variation is con¬ 
stant the heritability measure; h 2 , will have 
gone up. 

My idea about heritability in nineteenth- 
century England was labeled a conjec¬ 
ture because I do not have measures of 
either genetic or environmental variation 
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in nineteenth-century England. However, a 
modern study, that had half the data, sup¬ 
ports the conjecture. 

In modern European-North American 
industrial and post-industrial populations, 
estimates of h 2 fall in the range .40 < h 2 < 
.60. This value may not hold in subpopu¬ 
lations. Studies in the United States have 
found h 2 values in the .5-6 range for school- 
age children whose parents are of middle 
to high socioeconomic status (SES), but 
values of less than .40 for the children of 
low SES parents. 3 The investigators sug¬ 
gested that this is because variations in envi¬ 
ronment above a threshold quality, which 
most middle and high SES families exceed, 
have relatively little effect on intelligence. 
However, variations in environmental qual¬ 
ity below this threshold, which may occur 
in many low SES families, have a substan¬ 
tial effect on intelligence. This is a reason¬ 
able explanation of the findings, and there is 
some evidence, based upon direct observa¬ 
tion of the differences in children’s environ¬ 
ments in high and low SES homes, that the 
hypothesized environmental differences do 
exist. 4 

Gene x environment correlations are 
expressed by the Cov(G, E] term in 
equation 8.2. It is possible that there are 
genetic and environmental influences that, 
causally, have separate influences on a phe¬ 
notypic trait, including intelligence, and 
whose distributions are correlated in the 
population. Because the Cov(G, E] term is 
in the denominator in equation 8.3, if the 
correlation between genetic and environ¬ 
mental effects goes up, hr will come down. 
That is simple algebra. The causal relations 
are more interesting, for at times discussions 
of the inheritance of intelligence have been 
faulted by a failure to consider gene x envi¬ 
ronment correlations. The issue has been 
particularly important when conclusions are 
based upon an observation of a correlation 
between a person's intelligence and par¬ 
ental SES. 

3 Harden, Turkheimer, & Loehlin, 2007; Turkheimer 

et al., 2003. 

4 See Nisbett, 2009, for a review of this evidence. 


Gabon’s Hereditary Genius, one of the 
seminal nineteenth-century books on intel¬ 
ligence, is almost a prototypical example of a 
failure to consider gene-environment corre¬ 
lations. Galton examined historical records 
of “eminent men,” ranging from members 
of Parliament to war heroes, and found that 
they came from a relatively few families. He 
concluded: 

The direct result of this inquiry is to make 
manifest the great and measurable differ¬ 
ences between the mental and bodily facil¬ 
ities of individuals, and to prove that the 
laws of heredity are as applicable to the 
former as the latter. Its indirect result is to 
show that a vast but unused power is vested 
in each generation over the very nature of 
their successors-that is, over their inborn 
facilities and dispositions. 

Francis Galton, Hereditary 
Genius, Introduction 
to the 1892 reprinting 

Galton ignored the possibility that great 
families have social and economic resources 
that further their children’s careers in ways 
that have nothing to do with biology. Could 
this happen today? What do you think? To 
assist you, panel 8.4 provides an American 
example that may be food for thought. 

Galton leaped to a genetic conclusion, 
ignoring the effects of gene x environment 
correlations. Let us look at a contemporary 
error, but in the opposite direction. 

One of the charges made against the use 
of the SAT as a college entrance require¬ 
ment is that because examinees from high 
SES families obtain high test scores, the 
SAT perpetuates the “unfair” advantage that 
these examinees have because of their social 
background. 5 This argument turns Gabon’s 
conclusion on its head; the observation that 
there is a correlation between parental SES 
and SAT scores is taken as proof that chil¬ 
dren from well-to-do families have an unfair 
advantage because family wealth has been 
used to construct good environments for 
learning, including coaching on how to take 
the test. Therefore, the critics aver, the SAT 
should not be used to screen applicants to 

5 Lemann, 1999. 
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Panel 8.4. Has Anyone Ever 
Inherited the American Presidency? 

As of 2010 the United States had had 
forty-five presidents. Four American fam¬ 
ilies contributed two presidents apiece: 
two father-son combinations (John and 
John Quincy Adams, George H. W. 
and George W. Bush), one grandfather- 
grandson combination (William and Ben¬ 
jamin Harrison), and two distant cousins 
(Theodore and Franklin Roosevelt). The 
probability of this occurring by chance 
is miniscule; even in the Adams’ time 
the population of the United States num¬ 
bered in the millions. Does this mean that 
the United States never completely shook 
off the genetic determinism exemplified 
by the British royal family? 

We should not arbitrarily rule out 
the possibility that genetics contributed 
to these men’s ability to behave in a 


way that would ultimately bring them to 
the presidency. But let us consider some 
gene by environment correlations. It is 
well documented that the elder Adams 
and Bush assisted their sons in their 
early careers. (An unfriendly newspaper 
columnist described George W. Bush as 
having been born with a silver shoe in 
his mouth.) Theodore and Franklin Roo¬ 
sevelt were born to separate branches 
of the very rich Roosevelt family. They 
both suffered severe illnesses (Theodore 
in childhood, Franklin as an adult) that, 
in less affluent individuals, would have 
severely restricted their career prospects. 
Each Roosevelt received the best treat¬ 
ment that the medical science of the 
day could offer, and price was not an 
issue. 

So was ascension to the presidency 
dominated by genetics or environment? 
It is impossible to say. 


college, because doing so will solidify class 
lines. 6 This conclusion could be right, but it 
could be that relatively wealthy, highly edu¬ 
cated parents pass on a genetic advantage 
to their children. If this is the reason that 
children from higher SES families get better 
SAT scores, then a different argument has 
to be made about the use or nonuse of the 
test. 

In fairness to the many investigators who 
have conducted research in this area, I has¬ 
ten to add that it is not easy to measure 
gene x environment correlations. The con¬ 
ceptual issue is easy to grasp. In practice, 
though, we have a model for calculating 
genetic resemblances, but we do not have 
any such model for calculating environmen¬ 
tal resemblances. Here is the problem. 

Mendelian genetics provides a model for 
the genotypic resemblance between two 
individuals with specified family relation¬ 
ships. You obtain half your genetic material 
from your mother and half from your father; 

6 Atkinson, 2005. 


going back a bit, you obtain a quarter of your 
genetic material from each grandparent. We 
do not have any such model to specify envi¬ 
ronmental relationships. The difference in 
genetic distance between grandparents, par¬ 
ents, and self is clear. The difference in envi¬ 
ronmental distance between generations is 
not clear at all. 

As a result, investigators either treat envi¬ 
ronmental effects as an unmeasured resid¬ 
ual - that is, a “left over” effect - or they try 
to measure environmental resemblances by 
determining the correlations between cog¬ 
nitive measures and proxy measures of the 
environment, such as parental education, 
income, or, in one case I have seen, gov¬ 
ernment ratings of a neighborhood as being 
“poor,” “middle-class,” “upper middle-class,” 
or “rich.” In many studies only one such vari¬ 
able (e.g., parental education) is measured. 
There is a good chance that some crucial part 
of the environment will be overlooked. The 
size of the effect associated with the missing 
part can be measured, but the nature of the 
effect remains unknown. 
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8.2.5. Looking inside Genetic Effects 

Equations 8.1-8.3 treated genetic effects as 
a single package, G. Conceptually, G is the 
sum of four components; 

G=A+D+I + {G:E) (8.4) 

where A refers to additive genetic effects, 
D refers to dominance effects, I refers to 
a phenomenon called epistasis, and (G : E) 
refers to gene-environment interactions. Let 
us take each of these in turn. 

A is the sum of the independent effects 
of the genes. The D term adjusts for domi¬ 
nance, the fact that the influence of two alle¬ 
les of the same gene may not be strictly addi¬ 
tive. In a gene with two alleles, dominance 
occurs if the trait expressed by the heterozy- 
gotic case does not represent an average 
of the expressions of the two homozygotic 
cases. 

Epistasis, the I term, refers to interactive 
effects between different genes. 7 Epistasis 
occurs if the expression of one gene depends 
upon alleles present in another gene. To take 
an oversimplified example from evolution, 
the presence of feathers does not guarantee 
flight, nor does the presence of light bone 
structure in the forelimbs. If you combine 
them, and make a few other adjustments to 
the limb structure, none of which alone per¬ 
mits flight, you have a bird! 

As this example illustrates, on the evo¬ 
lutionary scale epistatic effects have been 
spectacular. Within the field of human intel¬ 
ligence, though, epistatic effects have vir¬ 
tually been ignored. This may be a mis¬ 
take, for as we gain a better idea of how 
the genes work we are seeing more and 
more cases in which one part of the genome 
regulates the expression of another part. 
At present, though, we are only begin¬ 
ning to identify the genes involved in intel¬ 
ligence. When these are known, epistatic 
effects may come to play a greater part in 
our thinking about intelligence than they 
do now. 

7 Plomin et al v 2008, p. 377. 


The (G : E) 8 term refers to gene-environ¬ 
ment interactions. A gene-environment 
interaction occurs when an environmental 
variable alters the expression of a genetic 
potential, or when the genetic character¬ 
istic changes the influence of an envi¬ 
ronmental variable. Alcoholism is a good 
example. Protracted alcoholism can have 
disastrous effects on cognition. Susceptibil¬ 
ity to alcoholism has a substantial genetic 
component. 9 In societies in which alcohol is 
widely used as a recreational drug, a person 
who has an inherited susceptibility to alco¬ 
hol is at risk for cognitive deficit. There is 
no genetic risk in a society that does not use 
alcohol. 

Gene-environment interactions are not 
the same as gene-environment correlations. 
A gene-environment correlation occurs 
when genetic and environmental influences 
make contributions to a trait that are inde¬ 
pendent within an individual, but have cor¬ 
related distributions across the population. 
For instance, the argument against Galton’s 
conclusion about genius, and the extent to 
which parental SES contributes to a per¬ 
son’s SAT score, are examples of gene- 
environment correlations. 

With a few exceptions (notably in 
the work of Urie Bronfrenbrenner and 
Stephen Ceci of Cornell University), 10 gene- 
environment interactions relevant to intelli¬ 
gence have not received very much atten¬ 
tion. Part of the problem is a technical one. 
It is difficult to separate gene-environment 
interactions and gene-environment correla¬ 
tions by statistical analysis. Hopefully this 
will change as better models are developed. 11 
There is also the continuing problem of 
deciding what environmental variables to 
measure. In some cases, though, the prob¬ 
lem is conceptual. The alcoholism example 
was clear-cut. But what happens when the 

8 This term is often written G x E. I have chosen a 
different notation to avoid confusion between the 
genetic concept of interaction and the statistical 
concept, which depends upon the multiplication 
operation. 

9 Plomin et al., 2008, pp. 270-277. 

10 Bronfrenbrenner & Ceci, 1994; Ceci, 1990. 

11 Johnson (2007) provides a good candidate for such 
a model. 
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interaction is social? Consider the case of 
children’s intelligence. 

Children's IQ scores are correlated with 
parental SES, with values in the .^-.4. range. 
Why? One possibility is that the effect is 
entirely environmental. Higher SES parents 
spend more time interacting with their chil¬ 
dren in ways that develop cognition than do 
lower SES parents. 12 For instance, the high 
SES parent is more likely to encourage chil¬ 
dren to solve problems on their own than is 
the low SES parent. The physical environ¬ 
ment is also better in the high SES home; 
the high SES home provides more books 
and creative toys. It could be argued that 
this is an example of a gene-environment 
correlation; the lower SES children would 
have higher IQs if they had the home advan¬ 
tages of the high SES children. It may be, 
though, that there are genetic influences. 
High SES parents may be genetically pre¬ 
disposed to interact with their children in 
particular manners, and the high SES child 
may have inherited a genetic potential to be 
receptive to these interactions. 

This example shows how difficult it is 
to untangle interaction effects when we are 
comparing a distal and a proximal influence. 
The genotype does not influence IQ scores 
in the same sense that it influences eye color. 
The IQ score is determined by the exami¬ 
nee's behavior in a testing situation, which 
is in turn determined by the examinee’s 
experiences and ability to learn from them. 
However, people influence the way oth¬ 
ers react to them. Therefore, the genotype 
exerts a distal influence on IQ scores, while 
experiences exert proximal influences. Add 
to this the fact that there are correlations 
between environmental opportunity and 
genetic constitution, and it is easy to see why 
sophisticated statistical models are needed 
to separate main effects, interactions, and 
correlations. 

8.2.6. Calculating Genetic Variance 

Quantitative behavior genetics (QBG) 
attempts to identify the extent to which 

12 Nisbett (2009) presents this argument in detail. 


variations in human traits are associated 
with genetic or environmental influences. 
In this section I will try to explain how this 
is done. I will start with a simplified model 
that illustrates the basic principles, and then 
explain the more realistic ACE model. First 
a few words are in order about the QBG 
approach itself. 

QBG deals with statistical associations 
between variation in genetic and environ¬ 
mental makeup and variation in displays 
of intelligence, usually test scores. This 
is necessary because it is impossible to 
test causal hypotheses by controlling either 
genetic makeup or environmental condi¬ 
tions in humans, in the way that might 
be done in studies of plants or nonhuman 
animals. 

Probably the easiest examples of QBG 
to understand are analyses of similarities 
and differences between monozygotic (MZ, 
“identical”) and dizygotic (DZ, “fraternal”) 
twins. Monozygotic (MZ) twins result from 
the splitting of a single fertilized ovum into 
two genetically identical ova, while dizy¬ 
gotic (DZ) twins result from the fertiliza¬ 
tion of two ova during a single act of sexual 
intercourse. This provides a “natural exper¬ 
iment,” in which the MZ twins are genet¬ 
ically identical, while the DZ twins, like 
siblings (SS), share half of the permissible 
human variation in genetic material. 13 The 
difference is important. 

The correlation between scholastic 
achievement test scores of ten-year-old 
identical (monozygotic, MZ) twins is about 
.78, varying a little with the academic topic. 
The correlation for fraternal (dizygotic, 
DZ) twins is about .50. 14 Intuitively the 
higher correlation in MZ twins suggests that 
genetic similarity is associated with similar 
academic accomplishments. 

13 Statements that two individuals share a certain per¬ 
centage of their genes are a bit misleading. Most of 
the genes in the human genome are shared with all 
animals, and many are shared with plants. Other 
genes are invariant over all people. These are the 
genes that define the species. The statement that 
two individuals share x% of their genes refers to the 
extent to which the genotypes of two individuals 
covary over those genes that can vary in humans. 

14 Plomin et al. ; 2008, Table 9.5. 
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Figure 8.5. The relationship between genetic 
and environmental contributions in a single 
individual. 


To go further, we need a more precise 
model of how the observed correlations are 
produced. Modem QBG relies heavily on 
a statistical technique called path analysis 
or, somewhat more frequently, structural 
equation modeling. The factor analytic tech¬ 
niques used in psychometrics (Chapter 4) 
are special cases of structural equation mod¬ 
eling. Historically, though, the factor ana¬ 
lytic techniques were developed first. 

Here is an illustrative simple model. Sup¬ 
pose that the genetic and environmental 
components are independent of each other. 
The phenotype for person i can be expres¬ 
sed as 

X 2 = h x Gi +e x E it (8.5) 

where X,- is person is score on test x (or 
any other record of a trait), G z is Vs genetic 
potential, and E : refers to environmental 
effects that are statistically independent of, 
and hence unpredictable by, genetic poten¬ 
tial. This model is shown in Figure 8.5. 

The goal of the analysis is to determine 
the values of h and e, the coefficients repre¬ 
senting the strength of the genetic and envi¬ 
ronmental influences. These are called the 
path coefficients. We cannot measure any¬ 
thing but X directly, so we cannot compute 
a simple regression of X on G and E. What 
we can observe, however, are the correla¬ 
tions (covariances) between the X values for 
people of known relationship. The variance 
in a trait can be broken down into its com¬ 
ponents; 

Var(X) = h 2 Var[G) + e 2 Var[E) 

+ iheCov[G, E ) (8.6) 


where Var indicates a variance and Cov 
a covariance. In words, equation 8.6 says 
that the variance in test scores is equal to 
the variance in genetic potential, weighted 
by the square of the genetic path coeffi¬ 
cient, plus the variance in the environmen¬ 
tal potential, weighted by the square of the 
environmental path coefficient, plus a term 
that depends upon the two path coefficients 
and the covariance between the genetic and 
environmental potentials. 

Since the scales of all variables are arbi¬ 
trary, we can deal with standardized vari¬ 
ables, all of which have a mean of zero 
and a standard deviation of 1. Equation 8.6 
becomes 

1 ~ h 2 + e 2 + 2 her[G, £), (8.7) 

where r(G, E) refers to the correlation 
between genetic and environmental poten¬ 
tials. 

If we assume that there are no gene- 
environment correlations, r(G, E] = o, equa¬ 
tion 8.7 becomes 

1- h 2 +e 2 . (8.8) 

The parameter h 2 can be interpreted as 
the percent of variance in the phenotypi¬ 
cal trait (X) that is associated with additive 
genetic effects. In general, a path coefficient 
is associated with every arc in the graphic 
version of the structural equation model. If 
there is no arc between two entities, as is the 
case for the G and E entities in Figure 8.5, a 
path coefficient of o is implied. 

The next step is to make use of the fact 
that we know the degree of genetic rela¬ 
tionship between any pair of relatives - for 
example, between siblings or between chil¬ 
dren and grandchildren. This lets us expand 
the basic model, as shown graphically in 
Figure 8.6. Instead of considering a single 
person, i, we consider two people, i and 
i', who stand in some known relation to 
each other - for example, MZ or DZ twins. 
Obviously there are equivalent equations 
and diagrams for each person individually. 
Figure 8.6 puts the two figures together, 
as indicated by the double-headed arrows, 
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Figure 8.6. A model of the relation between genetic and 
environmental effects in two individuals, i and i'. Known path 
parameters are shown in boldface. If the kinship relation between 
the individuals is known (sibs, first cousins, etc.), the genetic 
correlation, r(Gj, Gj ), is known from genetic theory. The 
correlation between scores on trait X, r(Xj, Xj), can be observed. 
Other parameters must be estimated. If one can assume that the 
environments are independent, then the environmental correlation 
r(Ei, Ej ) can be set to zero. 


which denote correlations between the two 
people's environments, r[E u £/), genetic 
constitutions, r(G t; G/], and scores on the 
trait in question, e.g. IQ scores, r(X I; X/). 

The path coefficients and correlations, 
collectively, constitute the observables and 
the parameters of the model. The observ¬ 
able, shown in boldface in Figure 8.6, is the 
correlation between test scores. The param¬ 
eters are either fixed or free. Fixed param¬ 
eters (also shown in boldface in the fig¬ 
ure) are parameters that are not directly 
observable, but are fixed by theoretical 
considerations. This is the point where 
QBG uses genetic theory. The correlation 
between genetic potentials is a fixed param¬ 
eter, because genetic theory specifies the 
fraction of genetic variation shared by two 
related people. This can vary from i, for 
monozygotic (MZ) twins, to o for genet¬ 
ically unrelated individuals. The h and e 
coefficients, which represent the effects of 
heredity and environment, are free parame¬ 
ters. Path analysis is used to find values for 


the free parameters that, when combined 
with the fixed parameters, reconstruct the 
correlations between observables as closely 
as possible. 

It can be proven, although I will not do 
so here/ 5 that the correlation between two 
observable variables will be equal to the 
sum of the products of the coefficients along 
each of the paths linking them in the dia¬ 
gram. Therefore, in this model the correla¬ 
tion between the X and X f scores is calcu¬ 
lated across pairs of people of a given degree 
of relationship, such as all MZ or all DZ twin 
pairs. 

r[X, X') = h x r(G, G)h x + e x r(E, E')e x 

r{X,X’) = hlr[G,G) + elr(E,E') 

(8.9) 

Equation 8.9 expresses one observable, 
r(X, X) in terms of four values on the 

15 See Bollen, 1989, or Loehlin, 2004, for discussions. 
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right-hand side. Such equations are mathe¬ 
matically indeterminate, because more than 
one value for the right-hand side variables 
could be chosen to satisfy the equation. 
(This is generally the case whenever there 
are more values to be calculated than observ¬ 
ables.) The solution to the problem is to con¬ 
sider different degrees of genetic and envi¬ 
ronmental relationships simultaneously. 

As an example, suppose that we study 
two samples - one of same-sex siblings raised 
in their parents’ home, and one of same-sex 
siblings in which each sibling is adopted into 
a different home. We make the following 
assumptions: 

1. The adoptions occur very early in 
the children’s life, ideally shortly after 
birth. 

2. The adoption agency does not prac¬ 
tice selective placement. There was no 
attempt to match the characteristics of 
the adoptive and biological parents. 

Under these assumptions it is reasonable 
to assume that the environmental corre¬ 
lation between the adopted siblings, r(£, 
E ), is zero. 16 The equation for the cor¬ 
relation between two individuals adopted 
into different homes, equation (8.9), simpli¬ 
fies to 

r apan {X, X) = /i 2 r(G, G). (8.10) 

At this point QBG introduces genetic 
theory. Genetic theory tells us that for DZ 
twins and same-sex siblings, r(G, G) = .5. 

The next step is to introduce observ¬ 
ables. As quite a few studies of siblings 
raised together and apart have been con¬ 
ducted, we have a good idea of the value 
of the correlation between test scores, 
r(X, X), for siblings raised together or 
apart - raised together .47, raised apart .24. 17 

16 This assumption is widely made. However, it can be 
challenged. Nisbett (2009) has speculated that the 
very fact that two families are willing to adopt may 
make home environments similar in ways that are 
important to cognition. 

17 Plomin et al., 2008, Figure 8.7. 


Therefore, for the siblings raised apart 

r apart [X, X') = fcj(. 5 ) (8.1l) 

h2 = rgpUK X') 

•5 

K = - 48 - 

For siblings raised together there will be 
some unknown but possibly nonzero envi¬ 
ronmental correlation, r[E , E ), reflecting the 
fact that two children raised in the same 
family are likely to have similar environ¬ 
ments: 

rtogetheriX, X') = ^(.5) + eV(E, £'). 

( 8 . 12 ) 

Making appropriate substitutions for 
observed and theoretically established val¬ 
ues, 

h 2 x (.5) = .24 from equation 8.11 

e 2 = 1 — h 2 from equation 8.8 (8.13) 

h 2 x (,5) + e z r[E, E ') from equation 8.12. 

This is a system of three linear equations 
in three variables, and hence has a unique 
solution. Rearranging the first line estab¬ 
lishes that h 2 = .48. Substituting this value 
into the second line, e 2 = .52. Using these 
values to solve for the remaining equation, 
we have r[E, E r ) = .44. 

Conceptually, the model tells us that 
if our assumptions are correct the variance 
in the test scores is approximately equally 
associated with genetic variance and vari¬ 
ance in the environment that is statistically 
independent of genetic variance, and that 
siblings raised together experience statisti¬ 
cally related, but not identical, environmen¬ 
tal influences. 

But how do we tell if our assumptions are 
correct? 

8.2.7. Fitting Models to Data 

The example just presented illustrates a 
saturated model. In saturated models only 
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one set of parameters will satisfy the data, 
and these parameters can be used to recon¬ 
struct the observed matrix of correlations or 
covariances perfectly. While studying satu¬ 
rated models can be informative, the fact 
that the parameters inevitably reconstruct 
the observations means that there has been 
no reduction in the complexity of our expla¬ 
nation. To illustrate, in the example just 
given only two parameters, Fandr(£, F), 
were estimated from the data, because the 
value of e 2 was dictated once h 2 was deter¬ 
mined. Therefore, we explained two obser¬ 
vations, rfoggier[X, X') and r apan {X,X), in 
terms of two estimated parameters. What 
we would like to do is to provide an expla¬ 
nation that is less complex than the orig¬ 
inal observations, by deriving K parame¬ 
ters to explain M observations, where K is 
(substantially) less than M. In order to do 
this we have to relax our requirement that 
the parameters reconstruct the observations 
exactly. Instead, we try to find parameters 
that can be used to derive expected observa¬ 
tions that can be compared to the observed 
values. 

This is done using complicated statisti¬ 
cal estimation methods that are far beyond 
the scope of this book. However, a simple 
illustration can be used to show the spirit of 
model fitting, without going into computa¬ 
tional details. 

Suppose that we add to the previ¬ 
ous example the case of dizygotic (DZ) 
twins raised together and apart. In the¬ 
ory, this should not make any difference, 
because DZ twins share the same amount 
of genetic material as do biological same- 
sex siblings (SS). In fact, though, the 
correlations are raised; r DZ{together)[X, X') = 
.55 compared to r S s(to g etker) (V X') = .47, 
and r D x(apart](X, X') = .35 compared to 
^SS(apart) (X, X*} = . 24 * 

If we repeat the analysis of equations 8-13, 
using DZ correlations instead of SS correla¬ 
tions, our estimates become h 2 = .70, e 2 = 
.30, andr(£, F) = .67 (actually 2/3). These 
numbers reconstruct the DZ twins data, 
but do not reconstruct the data for siblings. 
What we want are parameter estimates that 
can be used to approximate both the SS and 


the DZ data. We could average the values 
of the estimates obtained from SS and DZ 
cases, so that 

h‘ _ 7 ° + ^ ^ 

rtE.E-) = feL±!=. 555 . 


The observed values of the correlations 
between test scores now differ from their 
predicted values: 


Correlation 

Predicted Value 

Observed Value 

^SSapart 

■ 2 95 

.24 

^SStogether 

.522 

•47 

r DZapart 

.295 

■35 

rDZtogether 

.522 

■55 


We have obtained greater parsimony, by 
reconstructing four observations from two 
estimated parameters, at the expense of 
accuracy. 

At this point a certain amount of judg¬ 
ment is used. One could argue that genetics 
and environment must work the same way in 
both DZ and SS pairs, but because DZ pairs 
are the same age, their environments may 
be more similar than the environments of 
SS pairs. This is not an entirely ad hoc argu¬ 
ment, for SS pairs are, by definition, made 
up of an older and a younger sibling, and 
just being the elder or the younger may have 
different influences on development. If you 
accept this argument, you should require 
that h 2 have the same value for DZ and SS 
pairs, but you allow for separate values for 
rss(£, F) and roz[E, F). This changes the 
predicted-observed comparison to 


Correlation Predicted Value Observed Value 


^SSapart 

.295 

.24 

^SStogether 

.476 

■47 

^DZapart 

.295 

•35 

f"DZtogether 

.568 

•55 


The predicted and observed values are 
now closer to each other than before, but 
some parsimony has been lost, because four 
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observations are being derived from three 
estimates. 

This example mirrors the use of structural 
equation modeling in modern QBG and, for 
that matter, in other fields in the behav¬ 
ioral and social sciences. The only major 
difference between what has been shown 
and what is done in practice is in the way 
the parameters are estimated. The averag¬ 
ing method was used solely because it is 
easy to understand. In practice scientists use 
computer-intensive techniques of parame¬ 
ter estimation, which we need not discuss. 
The important thing is to understand the 
principles behind model fitting, rather than 
the computational details. 

8.2.8. A Rough-and-Ready Method 
for Estimating h 2 

Prior to the development of structural equa¬ 
tion modeling QBG relied heavily on “rough 
and ready” methods for calculating h 2 . This 
often forced researchers to rely on question¬ 
able assumptions. Here are three commonly 
offered comparisons 

One estimation method is based on com¬ 
parisons of the correlations between scores 
of MZ and DZ twins. If the twins are raised 
in the same family, 

tmz = h 2 -\-e 2 r{E, E ') 

r DZ = ±/P + e 2 r(E,F) (8.14) 

h 1 = 2 (r M z - tdz )■ 

This contrast is valid if the correlation 
between environmental influences is the 
same for pairs of MZ and (same-sex) DZ 
twins. If MZ twins live in more similar envi¬ 
ronments than DZ twins, the approximation 
in equation 8.14 will overestimate the heri- 
tability coefficient. 

A similar logic can be used to contrast 
the MZ twin correlations to (same-sex) sib¬ 
ling correlations, or to parent-child corre¬ 
lations. In both cases the assumption that 
the same environmental correlation applies 
across types of relationship is suspect. 

Another contrast is based on the test 
scores of MZ and DZ twin pairs when both 


twin pairs have been raised apart. Such sit¬ 
uations are unusual, but they do occur. If 
it can be assumed that the adoption pro¬ 
cess itself does not introduce a correlation 
between environments, the environmental 
correlation, r[E, F), will be zero for both 
MZ and DZ twins. Therefore, 

r M z(X,X')= h 2 

(8.15) 

r DZ (X,X) = i/P. 

8.2.9. The ACE Model 

We now examine a realistic, widely used 
model in QBG research on intelligence, 
the ACE model. This model assumes three 
sources of variation in test scores or similar 
indices: 

A: Additive genetic effects. The sum of the 
effects of different genes, without consider¬ 
ing dominance and epistatic effects. 

C: Shared environmental effects. “Shared " 
here refers to shared variation. Recall that 
QBG models operate on pairs of individu¬ 
als, such as siblings or twins, who stand in 
some relation to each other. Shared envi¬ 
ronmental effects refer to variations in the 
environment that vary across pairs, but 
are shared within a pair. For siblings, the 
clearest example ofnonshared environment 
effects is a difference in the family environ¬ 
ment. Even if both members of a sibling 
pair grow up in the same family, one will 
be older than the other, so the older and 
younger sib have different family environ¬ 
ments. 

E: Nonshared environmental effects. These 
are effects of the environment that differ 
both across pairs and within a pair. Con¬ 
tinuing with the sibling example, each sib¬ 
ling is part of the other sibling’s entnron- 
ment. An environmental effect associated 
with being an older or younger sibling - 
and there are such effects, as we shall see 
in Chapter 9 - would be a nonshared envi¬ 
ronmental effect. 

The causal equation for the ACE model 
is 


X- A+C+E, 


(8.16) 


226 


HUMAN INTELLIGENCE 



Figure 8.7. The ACE model. The path coefficient r(A, A ) is 
determined by genetic theory. The path coefficient r{C, C) is set to 
1, by the definition of shared environmental effects that act on each 
member of a pair. By a similar definition, r{E, El ) must be zero, and 
therefore is not contained in the model. The correlation between 
observable traits, r(X, X'), is determined by the data. Genetic and 
environmental coefficients a , c, e are estimated by fitting the model 
to the data for three or more types of pairs (e.g., MZ and DZ twins 
and siblings}. 


where X is a phenotypical trait score, A 
refers to additive genetic effects, C refers to 
shared environmental effects, and E refers to 
nonshared environmental effects. The path 
diagram for the ACE model is shown in 
Figure 8.7. As in Figure 8.6, observable corre¬ 
lations and theoretically dictated parameters 
are shown in boldface, and free parameters 
are shown in open type. The model per¬ 
mits estimation of heritability coefficients 
(h 2 ) and separate environmental coefficients 
for shared (c 2 ) and nonshared (V) environ¬ 
mental influences. 

The ACE model is the model of choice 
for many of today's QBG studies of cog¬ 
nitive and personality traits. It has been 
criticized for failing to allow for gene-by¬ 
environment interactions and correlations. 
If these are present they will be erroneously 
assigned either to environmental or genetic 
main effects. Alternative models that do 
permit measurement of interactions have 
recently been developed/ 8 but they require 
more observed correlations than the ACE 
model requires. 

18 Johnson, 2007. 


8.3. Observed Estimates of Heritability 

Table 8.2 summarizes the results of over 
200 studies of the heritability of intelli¬ 
gence, conducted up to 2000. I know of 
no studies since then that would substan¬ 
tially alter these results. Most of these stud¬ 
ies involved European or North American, 
usually White, participants, with sampling 
somewhat but not markedly biased against 
the inclusion of low SES participants. In 
theory, studies in other populations could 
produce different results. However, stud¬ 
ies done in both urban and rural India 
and in Japan have produced comparable 
correlations. 19 

19 These studies all rely on a comparison of the corre¬ 
lations between monozygotic and dyzygotic twins 
reared together. The Indian estimates, based on 
two small studies, report MZ correlations in the .9 
range, which is exceptionally high for this sort of 
study. The DZ correlations were slightly less than 
.50. The resulting heritability estimate was in the 
.80 range (Nathawat & Puri, 1995; Pal, Shyam, & 
Singh, 1997). The Japanese study (Lynn & Hattori, 
1990) produced a heritability estimate of approxi¬ 
mately .60. Indian results are close to the highest 
values reported for studies using a contrast between 
MZ and DZ twins, but they are not entirely out of 
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Table 8.2. Correlations and heritability estimates for pairs of individuals of various 
degrees of relationship. The right-hand column shows the heritability coefficients 
calculated using the simple model of Figure 8.5, which assumes that environmental 
correlations are zero for individuals not raised in the same home. 


Relationship 

Raised 
in Same 
Home? 

Degree 
of Genetic 
Relationship 

Correlation 
of Test 

Scores 

h 2 Estimate 

Comparison 

Used to Make 
Estimate 

MZ twins 

Yes 

1 

.86 

* 


MZ twins 

No 

1 

.76 

.76 

MZ apart 

DZ twins 

Yes 

-5 

•55 

.60 

2(MZ 

together-DZ 

together) 

DZ twins 

No 

•5 

•35 

(0,82, 

0 )- 7 ° 

(1) 2 (MZ 
apart-DZ apart) 

(2) 2 (DZ apart) 

Siblings (SS) raised 
together 

Yes 

•5 

•47 

.78 

2(MZ 

together-sibling 

together) 

Siblings (SS) raised 
apart 

No 

•5 

.24 

.48 

2 (SS apart) 

Unrelated individuals 
of different ages, raised 
in same homes; 
Intelligence measured 
in childhood 

Yes 

0 

■25 

0 

* 

“Virtual twins” 
measured in childhood 

Yes 

0 

.26 

* 

* 

Unrelated individuals 
of different age, raised 
in same homes; 
Intelligence measured 
in adulthood 

Yes 

0 

.02 

* 

* 

Parent-child 

Yes 

•5 

• 4 2 

* 

* 

Parent-child 

No 

•5 

•H 

.48 

2 (Parent-child 


apart] 


Note: The data is based on summaries by Scarr, McGue et al., and by Plomin et al. and virtual twin 
studies by N. Segal and her colleagues (McGue et al., 1993; Plomin et al., 1997, p. 140; Scarr, 1997, 
p. 28]. Unless specified otherwise the studies summarized were based on scores obtained when the 
offspring were in childhood and adolescence. 


The right-hand column of Table 8.2 is the 
estimate of heritability obtained using the 
simple model presented in Figure 8.5, which 

line. The Japanese study is in agreement with esti¬ 
mates of heritability obtained in Western industri¬ 
alized nations. 


assumes independent and additive environ¬ 
mental and genetic effects. Granted that 
the model is simplified, let us consider the 
important trends. 

Heritability estimates lie in the .5-.8 
range. 
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There are environmental effects. For all 
four nonzero biological relationships the 
correlation between pairs is from .10 to .20 
points higher when the pairs are raised in 
the same home. This increase is in the cor¬ 
relation coefficient, and is not affected by 
changes in the level of scores. We will deal 
with this in Chapter 9, where adoption 
effects are discussed. 

The three rows labeled “unrelated indi¬ 
viduals raised in the same household” and 
“virtual twins” are of special interest. “Same 
household” correlations refer either to 
children and adopted children in the same 
family, or to unrelated individuals who 
have been adopted into the same family. 
“Virtual twins” are a subset of the “same 
household” group in which the individuals 
are within nine months of age of each other, 
and have lived in the same household since 
before their first birthday. The correlation 
between cognitive test scores of virtual 
twins is .26 when taken in childhood. It 
would be of considerable interest to know if 
this correlation changes over the life span, 
but as of this writing the necessary research 
has not been done. 

While there is considerable variability in 
the estimates, the variability is systematic. 
Heritability estimates involving twins are 
higher than heritability estimates that do not 
involve twins. This seems to be a fairly con¬ 
sistent finding over the studies I have exam¬ 
ined. The correlation between the scores of 
DZ twins raised apart is substantially higher 
than the correlation between the scores 
of siblings raised apart, even though their 
genetic relationships are identical. Why? We 
do not know, but can consider some possi¬ 
bilities. 

The method of estimation assumed that 
there were no gene-by-environment interac¬ 
tions. To the extent that a person’s genetic 
makeup is important in shaping the envi¬ 
ronment, omitting this factor would inflate 
the correlation between MZ twins. A similar 
factor could influence the contrast between 
DZ twins and siblings. DZ and SS pairs 
are equal in the extent to which they share 
genetic makeup. However, DZ twins share 
the same prenatal environment, while SS 


pairs do not. Once they are born, any time- 
locked phenomenon that might influence 
cognition, such as exposure to measles or 
hepatitis, will influence DZ twins at the 
same age (which in some cases is an index 
of vulnerability), while SS pairs, being of 
different ages, will have different degrees of 
vulnerability. In the case of children raised 
together, SS pairs exert asymmetric influ¬ 
ences on each other’s environment, because 
one will be older than the other. 

The rough and ready estimates of heri¬ 
tability presented in Table 8.2 cannot be dis¬ 
regarded. However, the exact values should 
not be regarded as set in stone, due to 
the failure to consider gene-by-environment 
interactions and correlations. 

8.3.1. Adoption Studies 

Adoption studies provide a way of separat¬ 
ing environmental and genetic influences on 
intelligence. Environmental effects are indi¬ 
cated by correlations between an adoptee 
and members of his or her adoptive family, 
genetic influences by correlations between 
the intelligence of an adoptee and the bio¬ 
logical parents. In the ideal adoption study 
the children will have been adopted very 
early in life, data will be available on the 
intelligence of both biological parents, and 
the adoptees will be followed until adult¬ 
hood, to allow for dissipation of environ¬ 
mental effects and the effect of genetic traits 
that become apparent during adolescence, 
adulthood, and possibly old age. Such an 
ideal study has not been done, but several 
have come close. Four adoption studies are 
described in panel 8.5. 

The results of virtually all adoption stud¬ 
ies are consistent with the simple environ¬ 
ment + genetic model described in Fig¬ 
ures 8.5 and 8.6. As one reviewer put it, 

The heritability of cognitive ability is 
around .50 when collapsing across all stud¬ 
ies. However heritability appears to vary 
with age, with h 2 = .40 in early childhood, 
rising to .60 in early adulthood, finally ris¬ 
ing to h 2 = .80 in later life. 

Petrill, 2002, p. 284 
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Panel 8.5. Adoption Projects 

Studies comparing adopted children to 
children living with their biological par¬ 
ents provide an opportunity to compare 
the effects of genetic inheritance and 
environmental influences on intelligence. 
The analysis is not simple. 

At first glance, it might seem that 
any influence on an adoptee’s intelli¬ 
gence that was associated with the adop¬ 
tive family would be independent of the 
adoptee’s genotype. This is not correct, 
for to a considerable extent people make 
their own environments. This can be 
apparent at a very young age. Children’s 
reactivity has a partially genetic basis, and 
reactivity will influence how adults and 
other children interact with the adoptee. 
Such effects cannot be captured by a 
statistical model that assumes the non¬ 
existence of gene-by-environment inter¬ 
actions. Unfortunately, while it is 
easy to conceptualize a model that 
includes gene-by-environment interac¬ 
tions, it can be very difficult to obtain suf¬ 
ficient measurements to evaluate such a 
model. 

Then we come to the practical rea¬ 
sons. In order to achieve adequate statis¬ 
tical reliability, and in order to allow for 
drop-outs, a good study should involve 
at least 200 people, and preferably more. 
The studies are of most value if the 
children can be followed to maturity. 
Because of the difficulty of following peo¬ 
ple lor that long, it is desirable to take 
as many measures as possible. (While we 
will concentrate on measures of intelli¬ 
gence, many adoption studies also involve 
detailed studies of personality character¬ 
istics.] Large-scale studies that take place 
over an extended period of time require 
substantial amounts of long-term fund¬ 
ing, which is difficult to obtain. 

From the point of view of a study 
designer, adoption should occur as soon 
after birth as possible, in order to mini¬ 
mize the exposure that adoptees have to 


an environment related to the biological 
parents. The study designer wants place¬ 
ments to be random, so that there is no 
correlation between genetic and environ¬ 
mental potential. In practice, this may 
not happen, especially if the adoption 
agency attempts to match characteristics 
of the biological parents (or, more likely, 
the biological mother) with characteris¬ 
tics of the adopting family. Such a policy 
would produce a gene-environment cor¬ 
relation that would be hard to evaluate. 

Generalization is a problem. Children 
put up for adoption and the families 
who wish to adopt them are not repre¬ 
sentative of the general population. Par¬ 
ents who wish to adopt a child tend to 
be wealthier and better-educated than 
parents who put their children up for 
adoption. There is further self-selection 
in transracial adoptions and in adoptions 
of foreign children. Adoption agencies, 
quite reasonably, try to ensure that the 
child will be raised in a socially and eco¬ 
nomically healthy environment. This is a 
reasonable policy, but it does restrict the 
range of environments to which adopted 
children are exposed. The very fact that 
families are willing to adopt children indi¬ 
cates that, as a group, they place consid¬ 
erable value on raising children. All these 
effects act to reduce the variance in the 
environment of adopting families, com¬ 
pared to the range of environmental vari¬ 
ables in the general population. There¬ 
fore, adoption studies may underestimate 
the extent to which environmental vari¬ 
ables influence the development of intel¬ 
ligence in the population at large.* 

This panel describes four adoption 
studies carried out in the United States. 
They are somewhat different, but the 
findings are generally compatible. Taken 
together, they provide valuable infor¬ 
mation about the relative influences 
of genetic inheritance and environment 
upon intelligence. 

(continued) 
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Panel 8.5 (continued) 

The Texas Adoption Project 

The Texas Adoption Project' began in the 
early 1960s. It focused on 300 adoptees 
from a church-affiliated adoption home 
for unwed mothers. Both the biological 
mothers and the adoptive parents were, 
for the most part, white and middle-class. 
This probably reflects the social mores 
of the day. A woman who was a single 
parent faced greater social disapproval in 
the 1960s than she would today, and as a 
result there was more pressure to give up 
a baby for adoption. 

The adoptees, their biological moth¬ 
ers, adopting parents, and the biolog¬ 
ical children of adopting-parents were 
given intelligence tests. Either the WAIS 
or a nonverbal test was given, partly 
because of the varying ages of the chil¬ 
dren involved and also, in the case of the 
biological mothers, because of adminis¬ 
trative conditions in the adoption home. 
The biological parents had an average IQ 
score of 100, but there was less variability 
in test scores than would be the case in 
the general population. The adoptive par¬ 
ents’ IQs were slightly higher than those 
of the biological parents IQs - 100-101 for 
mothers, depending on the test used, and 
104 or 105 for fathers. The mean IQ scores 
of the natural and adopted children were 
in the 104-105 range, again depending on 
what test was used. The restriction in 
range in all groups would have the effect 
of reducing correlations between the test 
scores of various classes of individuals, 
compared to the equivalent correlations 
in a representative population. The study 
is also different from several other stud¬ 
ies in that the biological mothers and 
the adoptive parents had roughly equiva¬ 
lent SES. This contrasts with many other 
studies where adoptive parents tend to 


be both better-educated and economi¬ 
cally better-off (and hence better able to 
afford to raise a child) than the biological 
mothers. 

There were two waves of testing, an 
initial wave when the adopted children 
were eight years old, on the average (with 
a considerable range), and a follow-up 
of about 50% of the original sample ten 
years later. A variety of cognitive, per¬ 
sonality, and health measures were taken. 
Twenty years after the initiation of the 
study, a postal questionnaire was sent 
to all adoptees and biological children 
of participants and to the adoptive par¬ 
ents. The questionnaire covered various 
aspects of life adjustment, such as occu¬ 
pational and marital status. 

Genetic influences were substantially 
larger than environmental influences. 
The correlations between cognitive test 
scores of the birth mother and the 
adopted child's scores ranged from .23 to 
.36 In the initial study, depending upon 
the intelligence test used, and from .26 
to .78 (only a small number of cases) 
at the ten-year follow-up. The change 
over time illustrates Petrill’s comment 
that the heritability coefficient increases 
with time. The correlations between 
scores of biologically unrelated individu¬ 
als — adoptive mothers and fathers, and 
adopted children - ranged from .19 to 
.08 (again, depending on the test used 
and whether the mother or father’s score 
is being considered) at the original test¬ 
ing, and from .15 to —.02 in the follow¬ 
up. 

The postal survey twenty years later 
showed that the biological children of the 
adopting parents were doing slightly bet¬ 
ter than the adopted children, at least 
in terms of conventional, self-reported 
social adjustment. Adopting-parent char¬ 
acteristics were better predictors of the 
social adjustment of their biological than 
of their adopted children. 
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The Colorado Adoption Study 

The Colorado Adoption Study, begun in 
the 1970s, was a twenty-year study of chil¬ 
dren recruited from two adoption agen¬ 
cies in Denver, Colorado.* As was the 
case for the Texas study, the participants 
were largely White and middle-class. In 
addition to studying the development 
of adopted children, the experiment 
included a comparison group of over 200 
families consisting of biological parents 
and children. Cognitive tests were con¬ 
ducted when the children were one, two, 
three, four, seven, twelve, and sixteen 
years of age. This made it possible to 
compare stability of intelligence, in the 
sense of the relative standing of a person 
compared to others of the same age, from 
infancy to adolescence. 

The correlations between the test 
scores of adoptees and their adoptive par¬ 
ents were near zero at all ages. The cor¬ 
relations between the biological mothers’ 
scores and the adoptees’ scores increased 
from .18 in early and middle childhood 
to .38 at age sixteen. Factor analyses of 
the test scores showed that general intel¬ 
ligence, as indicated by the first factor 
on the cognitive tests, g, was responsible 
for the consistency in individual perfor¬ 
mance, and that this factor had a sub¬ 
stantial genetic load. Instabilities in test 
scores over time appeared to be related to 
differences in nonshared, within-family, 
environmental influences. 


The University of Minnesota Studies 

The next two studies were both done at 
the University of Minnesota/ 1 The trans- 
racial adoption study (TRA) dealt with 
African American children adopted by 
White families. Unlike the Texas and 
Colorado studies, in the TRA there was 


a substantial disparity between the socio¬ 
economic status (and concomitant educa¬ 
tional levels and intelligence test scores) 
of the biological mother and the adop¬ 
tive parents. This difference was central 
to the study, which was motivated by a 
desire to see what the effects of improved 
environmental conditions would be on 
the adopted children. The issues involv¬ 
ing racial differences in intelligence will 
be discussed in Chapter 11. The corre¬ 
lation between scores for adoptive par¬ 
ents and adopted children was .29, while 
the correlation for biological mothers and 
adopted children was .43. These correla¬ 
tions are somewhat higher but not out of 
line with the general trends reported in 
Table 8.2. 

The adolescent adoption (AA) study 
was a study of people ranging in age 
from sixteen to twenty-two. The inves¬ 
tigators tested a group of adoptees who 
had spent an average of eighteen years 
in their adopted homes. The correlation 
between scores of biologically related 
siblings was .35, while the correlation 
between adoptees and unrelated siblings, 
raised in the same home for an average of 
eighteen years, was zero! 

Virtually all adoption studies in North 
America and Europe produce similar 
results. The pattern of correlations indi¬ 
cates that h for general intelligence is 
at least .50. Environmental effects due to 
differences between families can be sub¬ 
stantial early in childhood, but decrease 
in adolescence and almost vanish in adult¬ 
hood. This suggests a dissipation in the 
effects of the adopting home over a per¬ 
son’s lifetime, which is certainly reason¬ 
able. The genetic inheritance is retained 
throughout life. 

* Nisbett, 2009. 

* Loehlin, Horn, & Willerman, 1997; Loehlin, 

Horn, & Ernst, 2007. 

1 Petrill et al., 2004; Plomin et al., 1997. 

$ Scarr & Weinberg, 1983. 
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Petrill’s summary, and others like it, are 
statements about the effect of adoption 
upon variation in measures of cognition. 
They say nothing about the effect of adop¬ 
tion upon the mean scores of adopted chil¬ 
dren, that is, about any benefit or general 
cost of adoption. Such effects are found. 
There is generally a temporary elevation of 
cognitive skills, as indicated by test scores, 
through the school years, but this dissipates 
in adolescence and adulthood. This does 
not mean that adoption does no good, for 
an increase in cognitive abilities during the 
school years can itself be important to devel¬ 
opment later in life. 

The second qualification has to do with 
generalization. Can heritability coefficients 
from studies of adopted children be extrap¬ 
olated to people in general? A case can be 
made that they cannot, on the grounds that 
adults who adopt children are less variable, 
within their own population, than parents 
in general. For example, the median age of 
women who adopt children is thirty. This 
is higher than the median age of birth for 
women who raise their own children. Adults 
who adopt tend to be of higher SES and 
to have more education than typical par¬ 
ents. These tendencies work to reduce the 
environmental variability among adopting 
families to a lower value than population 
variability. As pointed out earlier, if envi¬ 
ronmental variance is reduced, heritability 
coefficients will go up. 20 The extent of the 
upward bias in heritability introduced by the 
homogeneity of adopting families is hard to 
estimate, but the fact that there is some bias 
should be kept in mind. 

8.3.2. Turin Studies 

The study of twins is of great interest in 
QBG, because the contrast between MZ and 
DZ twins shows the effects associated with 
a clearly established degree of genetic dif¬ 
ference, between people of the same age, 
raised in either the same or different envi¬ 
ronments. Doing such studies is difficult, but 
not impossible, due to the relative scarcity 

20 Stoolmiller, 1999. 


of twins. DZ twins occur about once in 100 
births, MZ twins once in 250 births. 21 

The results from different studies are 
consistent. 22 The MZ correlations are about 
.8, regardless of the twins’ age. The correla¬ 
tions between DZ twins are about .6 until 
adulthood, at which point they drop to .4. 
This implies that the relative importance of 
genetic influences on intelligence increases 
in adulthood. The extent of the increase is 
illustrated in Figure 8.8, which shows the 
results of a large study in the Netherlands, 
which is typical of other findings. 23 The heri¬ 
tability coefficient was modest in early child¬ 
hood, but increased to a staggering .85 in late 
adulthood. The result is similar to results 
found in a US study of twins. 24 

While the findings from these studies are 
consistent among themselves, recent stud¬ 
ies have questioned the generality of their 
conclusions. Most of the twin studies rely 
on voluntary participation, sometimes over 
a considerable period of time. This intro¬ 
duces a bias toward the participation of 
better-educated, higher socioeconomic sta¬ 
tus children. Two recent studies of very large 
samples, taken over a wider range of the 
population, illustrate how this bias may 
affect estimates of the heritability of cog¬ 
nitive ability. One of these studies utilized 
WISC scores for seven-year-old children, 
obtained as part of a national study of chil¬ 
dren's health. The overall sample of over 
300 twin pairs was split into a low SES group 
(one-third of the mothers in this group were 
on welfare) and a high SES group. The 
differences were dramatic. The heritability 
coefficient, h 2 , was .72 in the high SES group 
and only .10 in the low SES group. A second 
study found a similar pattern in seventeen- 
year-olds. 25 

A similar study from the Netherlands 
found little, if any, difference in the 

21 The figures are given for natural births for Whites 

in the United States. African-Americans give birth 

to more twins, and Asian-Americans to fewer. 

22 McGue et al., 1993, p. 63. See also Boomsma, 

Busjahn, & Peltonen, 2002. 

23 Posthuma, De Geus, & Boomsma, 2004. 

24 McGue et al., 1993, p. 72. 

25 Turkheimer et al., 2003; Harden et al., 2007. 
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■ Additive genetic 

□ Shared environment 

□ Nonshared environment 


0 20 40 60 80 100 

Percent variance accounted for 

Figure 8.8. Percentage of the variance in intelligence test score 
account for by additive genetic influence, shared environmental 
influence, and nonshared environmental influence. Based on 
selected data from the Dutch Twins Study; Posthuma, De Geus, 
& Boomsma, 2004, Figure 9.1. 



heritability of adult intelligence as a func¬ 
tion of various socioeconomic indicators. 26 
The authors of that study suggest that the 
expression of intelligence in children may be 
more sensitive to variations in the environ¬ 
ment than is the expression of intelligence 
in adults. The authors also point out that 
studies of variation, which is typically the 
target of a QBG analysis, do not evaluate dif¬ 
ferences in means across groups. The study 
authors found a positive association between 
SES and adult intelligence test scores apart 
from the influence of heritability. 

With a few exceptions, the twins studied 
in the research just described were raised 
in the same home, by their biological par¬ 
ents. Conceptually, the almost-perfect study 
of the heritability of IQ is the study of the 
intelligence of MZ and DZ twins raised in 
different households by adoptive parents, 
preferably without any interaction between 
the two twins. Pragmatically, such a study 
is hard to achieve. Twins are rare enough, 
and only a small fraction of all twins are 
adopted apart. The first study of similarities 
in MZ twins raised apart was a case study 
of a single pair, in 1922. There were only 
three reputable studies, with twelve, nine¬ 
teen, and thirty-eight pairs of twins, in the 
next forty years. 27 In spite of the small sam¬ 
ple sizes these studies produced consistent 
results. The correlations between test scores 

26 Van der Sluis, Willemsen, et al., 2008. 

27 Bouchard, 1997, Table 5.1 and accompanying text. 


for MZ twins raised apart ( Tmza ) were, for 
each study, .71, .69, and .75. 

There was also a larger, highly publicized 
study that reported tuza = - 77 * Unfortu¬ 
nately, in one of the more embarrassing 
moments in the history of Psychology, it 
was subsequently found that the data was 
not credible. Whether this was due to fraud 
or to unacceptably careless record keeping 
is still debated. The incident is described in 
panel 8.6. 

The situation today is much better. 
Beginning in the 1970s, Thomas Bouchard, 
Jr., and his colleagues at the University of 
Minnesota began the Minnesota Study of 
Twins Raised Apart (MISTRA). Details 
of MISTRA are provided in panel 8.7. When 
enrollment ceased, in 2000, the MISTRA 
researchers had evaluated 139 twin pairs, in 
addition to evaluating the twins’ spouses, 
partners, and some adoptive and biological 
relatives. The evaluations included person¬ 
ality testing and collection of biographical 
and anthropometric data. In the intelligence 
part of the evaluation participants took three 
previously developed batteries of cognitive 
tests, totaling forty-two tests of cognitive 
skills. 28 

About the same time that MISTRA 
began, a group of Swedish researchers ini¬ 
tiated an extensive study of twins, raised 
together and raised apart. 29 Sweden, like 

28 See Johnson et al., 2007a, or Johnson & Bouchard, 

2005, for a detailed listing of the tests. 

29 Finkel & Pedersen, 2004. 
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Panel 8.6. The History of Studies 
of Twins Adopted Apart: I. Sir 
Cyril Burt 

This panel describes one of the fascinat¬ 
ing stories in the history of psychology, 
complete with intrigue and, perhaps, a 
villain. 

Cyril Burt (1883-1971) was an ex¬ 
tremely influential British psychologist. 
He was an early factor analyst, claiming 
credit for some of the major advances 
in the field. He held important posts 
in educational policy and research and 
conducted important research on juve¬ 
nile delinquency. In 1932 Burt succeeded 
Spearman as Professor of Psychology at 
University College, London. A Professor¬ 
ship was then equivalent to a lifetime 
appointment as a Department Chair. 
The appointment carried with it much 
more authority and prestige than do 
such positions today. He appears to have 
discharged his duties in an intellectu¬ 
ally impressive but socially imperious 
manner. 

Burt retired in 1950 but continued to 
publish until his death in 1971. Among 
his other honors Burt was knighted and 
received a gold medal for distinguished 
lifetime contribution from the American 
Psychological Association. 

Sometime before World War II, Burt 
began collecting data on twins. He pub¬ 
lished a variety of papers on the topic in 
the interval between 1943 and 1966. An 
additional paper was published posthu¬ 
mously in 1972. The most noted of these 
papers dealt with MZ twins raised apart. 
Some of the correlations reported were 
in the high .8’s and even .9’s, indicat¬ 
ing almost complete heritability of intel¬ 
ligence. Related papers confirming Burt’s 
results were published by a "Miss Con¬ 
way" and a “Miss Howard.” 

In 1974, three years after Burt’s death, 
Leon Kamin, a professor at Princeton 
and an ardent opponent of the idea 
that intelligence is determined geneti¬ 


cally, observed that some of Burt’s cor¬ 
relations, which were purportedly on 
different data sets, were identical to the 
third decimal point. The probability of 
this happening by chance is miniscule. 
Subsequently, Arthur Jensen, who is an 
advocate of genetic theories of intelli¬ 
gence, made a searching reexamination 
of Burt’s data, and found additional sus¬ 
picious entries.* Jensen pointed out that 
the results could either have been fraud¬ 
ulent or perhaps the result of confusion, 
as Burt was elderly at the time the papers 
were written. The plot thickened when a 
biographer of Burt’s, Leslie Hearnshaw, 
found papers that suggested both that 
some of Burt's data was fraudulent and 
that the papers by Conway and Howard 
had in fact been written by Burt himself. ^ 
There appear to be few, if any, records 
that unambiguously document Conway’s 
or Howard's career. Since Jensen’s paper, 
in 1974, no knowledgeable scientist cites 
Burt’s data. 

I do not think we will ever know 
whether Burt was intentionally fraudu¬ 
lent or unacceptably careless in his later 
years. The cases for and against him rest 
largely on circumstantial evidence. But 
the issue is not whether Burt was a fraud 
or a good scientist gone bad as he aged. 
It is whether or not he hindered the 
advancement of knowledge. 

The correlations Burt reported are 
only slightly higher than those reported, 
with far better data and no hint of fraud, 
by several contemporary studies of twins 
raised apart. So in the narrow sense Burt 
did not harm science. His claims did not 
lead scientists up a garden path, chasing 
a false result. 

In a broader sense, Burt did profound 
harm to science in general, to Psychology, 
and in particular to Behavior Genetics. 

In the 1960s and 1970s psychologists 
and educators emphasized the role of 
learning and social opportunity as agents 
for the development of cognitive skills. 
In the United States these beliefs were 
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closely tied to social and moral issues, 
for there were great hopes that the end 
of segregated schooling would quickly 
result in economic and social equality 
between African Americans and Whites. 
Similar feelings were also strongly held 
in Europe. However, progress toward 
equality following desegregation proved 
to be much slower than had been 
anticipated. 

In 1969 Jensen published a highly con¬ 
troversial paper in the Harvard Edu¬ 
cational Review in which he pointed 
out that Black-White differences in aca¬ 
demic achievement could be due in 
part to genetic differences between 
the races.* Jensen cited ten of Burt’s 
papers in support of his argument. 
Jensen s article elicited a furious counter¬ 
argument. The subsequent revelation 
that Burt’s data was, at best, incompetent 
and, at worst, fraudulent led not only to 
a rejection of Jensen’s argument but also 


to a general, albeit unjustified, condem¬ 
nation of genetic studies of intelligence. 
This hurt the field in a variety of ways, 
not the least of which was a drying up of 
funds for research. 

Science ought to inform policy mak¬ 
ers wrestling with political and social 
issues. The only claim that scientists can 
make to a privileged status in such a 
discussion is their commitment to sup¬ 
porting their arguments with objectively 
collected data and thoughtful analysis, 
rather than moral suasion. When a sci¬ 
entist offers an opinion based on indefen¬ 
sible data it hurts us all.^ 

My comments are based on a book on Burt 
edited by Macintosh (1995) that includes papers 
by two who had known Burt personally, Arthur 
Jensen and Hans Eysenck. 

* Jensen, 1974. 

Hearnshaw, 1979. 

5 Jensen,1969. 

§ See Hunt & Carlson, 2oo7a,b r for elaborations 
on this point. 


many European countries, maintains gov¬ 
ernment records of the health of its citi¬ 
zens. In 1978 some researchers noticed that 
a substantial number of twins born prior 
to 1945 indicated that they had been raised 
separately as children. 50 The investigators 
contacted all twins who were age fifty or 
older, and many of them agreed to inter¬ 
views and fairly extensive psychological test¬ 
ing, although far short of the testing involved 
in the MISTRA project. 

The Swedish and MISTRA studies com¬ 
plement each other. MISTRA dealt with 
twins aged eighteen upward into their six¬ 
ties and seventies, in a single testing, using 
a cross-sectional design. The Swedish study 
tested participants on four different occa¬ 
sions, spanning a thirteen-year period. Com¬ 
paring the studies, the MISTRA study cov- 

30 These twins would have been at least thirty-three 
years old. They all would have been born either 
during the worldwide depression or during World 
War II. Sweden was neutral in the war, and received 
substantial numbers of refugee children from neigh¬ 
boring countries. 


ered a wider age range, while the Swedish 
study had relatively more participants in the 
fifty-plus age range. The MISTRA study had 
considerably more data on any one individ¬ 
ual than did the Swedish study. Because of 
its longitudinal design, the Swedish study 
provided an opportunity to look at changes 
in cognition at the time of life when some 
people show marked cognitive decline. 

The MISTRA data has been ana¬ 
lyzed using Johnson and Bouchard's g-VPR 
model, 51 producing heritability estimates for 
g and the lower-stratum abilities, includ¬ 
ing the broad VPR factors. In addition, the 
MISTRA investigators were able to deter¬ 
mine whether or not the different abili¬ 
ties are influenced by the same genetic fac¬ 
tors. This is determined by calculating the 
genetic correlation, which can be thought of 
as an estimate of the correlation between the 
genetic influences on each of the two abili¬ 
ties being compared. The idea is shown, in 
diagrammatic form, in Figure 8.9. 

31 Johnson et al., 2007. 
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Panel 8.7. Studies of Twins Adopted 
Apart: II. The Minnesota Study of 
Twins Raised Apart (MISTRA) 

Cyril Burt's emphasis on the study of MZ 
and DZ twins reared apart (panel 8.6} 
was a good idea, whatever the demerits of 
his own research. Both before and since 
Burt's time several studies of MZ and DZ 
twins have been attempted, but only a 
few of them have involved a substantial 
number of twins reared apart. The rea¬ 
son is simple: when one combines the 
criterion “being an MZ twin” and “being 
adopted away from your birth partner” 
you do not have very many people 
left. 

In 1979 a group at the University of 
Minnesota, under the general leadership 
of Thomas Bouchard, Jr., embarked on 
an attempt to find as many twins raised 
apart as they possibly could. Recruitment 
was by advertising and word-of-mouth. 
By the time recruitment was terminated, 
in 2000, 139 twin pairs had been enrolled. 
In addition to the twins, Bouchard and 
his group interviewed and tested as many 
related people, including spouses, as they 
could find. Eventually over 400 people 
participated. The testing took approxi¬ 
mately fifty hours, over a week-long visit 
to Minneapolis, and included personality 


tests, physical measurements, and collec¬ 
tion of biographic data as well as exten¬ 
sive intelligence testing. The result has 
been the compilation of one of the most 
important and extensive databases on 
twins raised apart that exists. 

Bouchard and his colleagues deserve 
great credit for their perseverance. Espe¬ 
cially in the early years, government grant 
money for this important research was 
hard to come by, in no small part because 
of the political reaction against the idea 
that there are genetic influences on intel¬ 
ligence (see panel 8.6}. The situation was 
not helped by the questionable status 
of Burt’s widely discussed studies, but 
I think that the reaction would have 
occurred anyway, due to deeply held 
political/social beliefs that disparities in 
education and socioeconomic status in 
the United States are almost entirely due 
to environmental causes, often associated 
with past prejudice and privilege. Obvi¬ 
ously I cannot prove this. Bouchard has 
expressed similar concerns.* 

In addition to the investigators, the 
University of Minnesota deserves credit 
for its willingness to support Bouchard 
and his group in this important, extended 
research effort. 

* Bouchard, 1997, pp. 126-127. 


Figure 8.10 shows heritabilities [h 2 ] cal¬ 
culated for the latent traits at each 
stratum. The figure shows the division 
between genetic and environmental associa¬ 
tions within each of the latent traits at each 
level - that is, the g level, the three VPR 
dimensions at the third level, and the nar¬ 
rower cognitive skills at the second level. 
With the exception of the perceptual trait, 
every third- and fourth-level trait has a her- 
itability greater than .40. General intelli¬ 
gence (g) has a heritability value of more 
than .70. The genetic correlations between 
traits were substantial, showing that 
related genetic influences contribute to all 
abilities. 


Similar results were found in the Swedish 
study, where the investigators used the 
scores provided by the WAIS rather than 
the g-VPR model. The heritability estimate 
for the g factor was .91! This is the highest 
heritability estimate for intelligence that I 
have seen, in any substantial study.^ 2 

8.3.3. The Unfolding of Cognitive Abilities 

Turning back to Figure 8.8, we can see that in 
the Dutch study there was a regular increase 
in the heritability coefficient from childhood 
to age fifty. We see the same trend across 

32 Reynolds et al., 2005, Table 5. 
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Figure 8.9. A schematic of genetic correlations 
involving the P and R factors in the VPR model 
of intelligence. The genetic correlation, 
r(G p , G r ), is a measure of the extent to which 
the genes responsible for each trait covary with 
each other. Genetic correlations can vary 
independently of the genetic loadings. 

Therefore, two traits might have high 
heritability and a low genetic correlation, or a 
low (but nonzero) heritability and a high genetic 
correlation. 

studies. In the Colorado Adoption Study 
the heritability coefficient was .56 at age 
sixteen; 33 in the Swedish study it was .91 for 
people age fifty and older. At first this defies 
logic, for, as Bouchard put it to me in pri¬ 
vate correspondence, one would think that 
the slings and arrows of outrageous fortune 
would accumulate, producing environmen¬ 
tal differences over the years. Are there clues 
in the data indicating why this result might 
occur? 

I am struck by two pieces of evidence. 
The first is that the heritability estimate is 
influenced by the difference between twin 
correlations in MZ and DZ twins; the larger 
the difference, the larger the heritability 
estimate. In a review of the data available up 
to 1989, 34 it was clear that the increase in the 
heritability estimates was due to a decrease 
in the DZ correlation from .6 (childhood) to 
.4 (adult) while the MZ correlations stayed 
constant at slightly above .8. The second 
piece of evidence comes from the Swedish 
study of twins over fifty. Recall that in 
that study cognitive abilities were measured 
several times, making it possible to assess 

33 Plomin et al., 1997. 

34 McGue et al., 1993, Figure 1. 


both the overall cognitive level of partic¬ 
ipants over the fifty-to-eighty interval and 
the change in cognitive abilities over that 
interval. As would be expected, there was a 
general decrease in cognition as the partici¬ 
pants moved into their sixties and seventies. 
Structural equation modeling showed that 
this decrease was largely due to nonshared 
environmental differences, the differences 
between twins, rather than the environmen¬ 
tal differences across twin pairs. Note that 
“nonshared environmental differences ,, has 
a rather different interpretation when the 
comparison is based on adults living apart, 
rather than on children living in the same 
household. 

I suggest that there are two processes 
going on. One is that some genetically influ¬ 
enced biological processes present them¬ 
selves only during adulthood and, especially, 
old age. Huntington’s disease (panel 8.1) 
and incipient signs of Alzheimer’s Demen¬ 
tia, are examples. Among the healthy 
elderly, genetically influenced limitations 
on information-processing capacities may 
become more constraining as we grow older. 
These phenomena may account for a sub¬ 
stantial part of the increase in heritability. 

This sort of genetic limit would not 
be surprising from an evolutionary point 
of view. Genetic weaknesses that are not 
expressed until after the reproductive years 
do not affect a person’s reproductive fitness, 
so there would be no selective pressure on 
the genes involved. 

The second process is quite different. It is 
suggested to me by the constancy of the MZ 
correlations as the DZ correlations fall. To 
a substantial degree, people make their own 
environments. Genetic influences, acting as 
distal causes, influence exposure to environ¬ 
ments that, acting as proximal causes, dif¬ 
ferentially accelerate or decelerate cognitive 
change. DZ twins, being genetically more 
distinct than MZ twins, will select environ¬ 
ments that differ more than the environ¬ 
ments selected by MZ twins, and, as a result, 
the DZ correlations will go down over time, 
due to nonshared (between twins) environ¬ 
mental differences being greater in the DZ 
than in the MZ twins. 
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Figure 8.10. The relative proportions of variance associated with 
genetic and environmental variance in the factors defined for the 
g-VPR model. Data based on Johnson et al., 2007, Table 5. 


This analysis is a conjecture. Proof will 
have to wait for further research. 

8.3.4. The Heritability of 
Information-processing Traits 

While there have been studies of the genetic 
basis of a large number of information¬ 
processing traits, our interest centers on 
the two information-processing functions 
shown to be most closely related to intelli¬ 
gence (Chapter 6), speed of processing and 
working memory. For the reasons given in 
Chapter 6, I will be concerned with work¬ 
ing memory, overall, and will not attempt to 
fractionate it into finer components. 

Both working memory and processing 
speed show substantial genetic heritability. 
The genetic correlations indicate that the 
genetic basis of the information-processing 
components is the same or nearly the same 
as the genetic basis of general intelligence. 
Most of our knowledge of this comes from 
studies of twins. Four such studies will be 
described. They have been selected to make 
a point about consistency of results. 

The first of the studies was of Dutch 
twins, born in the early 1990s and evaluated 
at five and twelve years of age. 55 Siblings 

35 Polderman et al., 2006, 2007. See Polderman et al., 

2007, for a discussion of previous work on this topic. 


were also studied. Participants took a com¬ 
prehensive battery of markers for psycho¬ 
metric g and tasks evaluating the speed and 
capacity of working memory. Heritability 
estimates for both speed of processing and 
the capacity of working memory were in 
the .5-.6 range, increasing slightly from age 
five to age twelve. Stability, in the sense 
of the extent to which scores at age five 
could predict subsequent scores, appeared 
to be largely mediated by genetic influences. 
The authors point out that this is impres¬ 
sive, because the brain undergoes substan¬ 
tial development from five to twelve years 
of age. 

An Australian study 56 evaluated genetic 
contributions to information processing and 
intelligence test scores in sixteen-year-old 
twins and their siblings. Heritability esti¬ 
mates for choice reaction time tasks var¬ 
ied from .7 to .5, with heritability increas¬ 
ing as the number of choices increased. A 
delayed reaction time task was used to eval¬ 
uate working memory. It had a heritability 
estimate of .48. 

In Japan a study was conducted of young 
adult twins (aged fourteen to twenty-nine, 
mean age twenty). The twins were given 
an intelligence test and both verbal and 
spatial-visual working memory tasks. The 

36 Luciano, Wright, et al., 2001. 
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heritability coefficients for the working 
memory tasks ranged from .43 to .48, de¬ 
pending on the task. 57 

In the United States a twin study was con¬ 
ducted utilizing records maintained by the 
US Department of Veterans’ Affairs (VA), 
which is responsible for ongoing studies of 
the health of military veterans. 58 Just under 
350 pairs of male twins, ranging in age from 
forty-one to fifty-eight, were given the read¬ 
ing span measure of working memory (see 
Chapter 6}. The estimate of h 2 based on 
the difference in correlations between MZ 
and DZ twins was .58. This was reduced 
slightly by more sophisticated modeling, but 
it was clear that the estimate should be in the 
.5-6 range. Further analyses indicated that 
the correlation between the reading span 
measure and measures of reading skill were 
mediated genetically. 

These four studies, done with partici¬ 
pants of different ages and conducted by 
different laboratories in different countries, 
have produced consistent results. Work¬ 
ing memory, one of the key information¬ 
processing underpinnings of intelligence, 
is subject to substantial genetic influence. 
However, the influence seems to be some¬ 
what smaller than the genetic influence on 
measures of g. 

Establishing the genetic basis for speed 
of cognitive processing is a bit compli¬ 
cated. Speed of cognitive processing has 
been shown to be a component of intel¬ 
ligence apart from the storage and atten- 
tional control components of working mem¬ 
ory (Chapter 6). However, in above-average 
young adults (usually college students) the 
contribution of processing speed to conven¬ 
tional psychometric measures of intelligence 
is considerably smaller than the contribution 
of the working memory functions. By con¬ 
trast, the decline in measures of psychome¬ 
tric intelligence from middle age onward is 
very largely associated with declines in speed 
of processing. This leads us to suspect that 
studies of college students may not tell us all 
that we need to know about the importance 

37 Ando, Ono, & Wright, 2001. 

38 Kremen et al., 2007. 
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of speed of processing in the population at 
large. 

There are many ways to measure speed 
of cognitive processing. Most conven¬ 
tional intelligence tests use pencil-and-paper 
methods, where the dependent measure is 
how many easy tasks can be accomplished in 
a fixed time period (e.g., simple additions). 
Researchers, especially those who have dealt 
with information-processing measures in 
contexts other than studies of individual dif¬ 
ference, prefer the tighter control afforded 
by computer-controlled tasks, such as the 
choice reaction time (CRT) and inspection 
time (IT) measures. Within each of these 
broad paradigms, and especially within the 
information-processing paradigm, there are 
many variations in procedure. 

In spite of these difficulties, studies of 
the genetic basis of speed of cognitive pro¬ 
cessing have been remarkably successful. A 
meta-analysis of several studies concluded 
that heritability was about .18 for easy timed 
tasks (which do not have high correlations 
with test scores) and .52 for hard tasks. 59 
Looking at the details of a few of the studies 
is informative. 

In the study of Dutch twins referred to 
earlier, the investigators found that, depend¬ 
ing on the particular speed-of-processing 
task, heritability coefficients ranged in the 
.4-. 5 range. 40 Moving to adolescents, investi¬ 
gators in the Colorado adoption study (panel 
8.4) studied the genetics of speed of process¬ 
ing in sixteen-year-olds, using pencil-and- 
paper tasks. This study found a speed of 
processing heritability of .48. 41 Literally on 
the other side of the world, the estimated 
heritability of inspection time was .80 in the 
Australian study of sixteen-year-old twins. 42 
This is the highest reported value of heri¬ 
tability that I know of. 

One of the studies associated with the 
Dutch Twin Registry examined heritability 
in two cohorts, one aged twenty to thirty 
at the time of the examination and another 

39 Beaujean, 2005. See also Jensen, 2006, p. 130 ff., for 

some brief notes concerning other studies. 

40 Polderman et al., 2006. 

41 Alarcon et al., 1999. 

42 Luciano, Smith, et al., 2001. 
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Figure 8.11. Heritability estimates for three academic skills, 
based on data from the TEDS study (Chapter II). NC refers 
to teachers' ratings of student progress on the United 
Kingdom national curriculum in reading or mathematics. 


aged forty to fifty-five. The heritability 
estimate for the inspection time measure 
was .46, and did not differ across cohorts. 45 

The Swedish study of twins raised apart 44 
contained participants from fifty to eighty 
years old. The study included pencil- 
and-paper measures of perceptual speed. 
Because this was a repeated-measures study 
it was possible to estimate the extent of 
heritability and environmental influences on 
both overall perceptual speed (as evaluated 
by pencil-and-paper testing) and changes 
in perceptual speed. The heritability coef¬ 
ficient was around .8 for overall speed 
(depending slightly on which of two tests 
was used to measure speed), but the major 
influence on change was the nonshared envi¬ 
ronmental difference. 

8.3.5. Q^ G Analyses of Academic Skills 

Conventional intelligence tests evaluate 
important cognitive traits, but any reason¬ 
able theory of intelligence has to include 
cognitive traits that are not evaluated by the 
tests, and quite possibly cannot be evaluated 
in the context of the time-limited “Drop 
in from the Sky" testing session. When dis¬ 
cussing the genetics of intelligence it makes 
sense to ask what contribution genetic inher¬ 


itance makes to cognitive skills that are 
important in the world, but lie outside the 
conventional testing realm. Among the most 
important of these are skills in reading and 
elementary mathematics, because of their 
central role in school and the workplace. 

The largest study of genetics and aca¬ 
demic achievement today, and probably the 
largest that will be conducted for some 
time, is the Twins Early Development Study 
(TEDS), a United Kingdom study of twins 
involving 12,000 participating families. 45 The 
study is described in panel 8.8. Testing was 
carried out in infancy, where the evalua¬ 
tion emphasized language development, and 
again at seven, nine, and ten years of age. 
At each year age-appropriate evaluations 
were made of children’s progress in English- 
language studies, mathematics, and science. 
These evaluations were supplemented by 
teachers' ratings of student progress. 

Figure 8.11 shows the estimated heritabil¬ 
ity for two classes of variables: teacher rat¬ 
ings and tests of children’s mathematics, 
reading, and science levels at three different 
ages. The estimates range from a minimum 
of .4 to a maximum of .7. In all cases the 
heritability estimate exceeded the estimate 
for percentages of variance associated with 
either shared or nonshared environments, 


43 Posthum a ( de Geuss, & Boomsma, 2001. 

44 Finkel & Pedersen, 2004; Reynolds et al., 2005. 45 Kovas et al., 2007. 




























THE GENETIC BASIS OF INTELLIGENCE 


241 


Panel 8.8. The Twins Early 
Development Study (TEDS) 

TEDS was a study of over 12,000 families, 
recruited from some 25,000 families who 
had twin births in the 1994-96 period, 
according to the United Kingdom (UK) 
National Health database. The children 
were tested when they were seven, nine, 
and ten years of age as, according to the 
UK’s curriculum plan, these represent 
critical ages in cognitive development. 

In-person testing was generally not 
possible. Therefore, the investigators 
relied on two sources of data: teachers’ 
ratings of how well the students were 
doing in Reading, Mathematics, and Sci¬ 
ence, and some cleverly designed "dis¬ 
tance tests" that could be administered 
(with parental cooperation) over the tele¬ 
phone or, for older children, over the 
World Wide Web. Verification studies 
involving in-person testing were con¬ 
ducted in order to check on the accu¬ 
racy of the distance tests. Teacher rat¬ 
ings and the test results told basically the 
same story. Heritability was the largest 
single factor determining variation in 
all skills. Nonshared (basically withm- 
family) environment was the second- 
largest factor. 

The conclusions of the TEDS study 
that relate to our understanding of intel¬ 
ligence are presented in the main text. 
Here I will consider two further aspects 
of this large, important study; how widely 
its results can be generalized, and what 
these results mean for education, espe¬ 
cially special education programs for chil¬ 
dren who are not doing well. 

TEDS found that in a broad sense 
genetic heritability was the single largest 
influence on test scores. There was no 
evidence that very poor performance, in 
the bottom 5% or 15% of the popula¬ 
tion, was due to some special condition 
(e.g., a genetic condition that impacted 
ability to acquire language skills). Putting 
this somewhat colloquially, according to 
the TEDS results many of the children 


who, in the US, would qualify for spe¬ 
cial education in reading and mathemat¬ 
ics aren ’t really special; they are just at the 
lower end of the distribution of reading 
and mathematics skills. Against this, we 
have to remember that there are specific 
genetic bases for a number of moderate 
mental disabilities, including reading dis¬ 
abilities. By lumping together all children 
below a certain ability level, the TEDS 
investigators may have developed a pic¬ 
ture of disability that is accurate at a large 
scale, but that overlooks some important, 
albeit uncommon, specific disabilities. 

Environmental effects upon the means 
of affected groups are well established. 
For instance (as is documented in the 
next chapter), intensive preschool edu¬ 
cational programs can improve the aca¬ 
demic readiness of children from very 
low SES groups. This indicates that bet¬ 
ter school environments can help, a point 
that has been amply demonstrated in a 
variety of settings.* Whether these pro¬ 
grams will do anything to diminish the 
variation in performance between chil¬ 
dren is another matter. 

The fact that genetic influences are 
so strong suggests that genetic analysis 
could serve as an early warning signal for 
identifying children who might have diffi¬ 
culty acquiring language and mathemat¬ 
ical skills. The genetic analysis could be 
as simple as identifying a child whose rel¬ 
atives have had trouble with academics 
in the past, or it could be as complex as 
actually identifying the child’s genome. 
We are not quite ready for the latter sort 
of testing. Leaving aside the costs of such 
a program, which are dropping rapidly, 
we do not know, yet, just what genes 
and alleles to look for! In addition, using 
genetic indicators as guides for educa¬ 
tional decisions raises social policy issues 
that are quite beyond the scope of this 
book. 

The TEDS analyses of children at the 
low end of mathematics, reading, and 

(continued) 
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Panel 8.8 (continued) 

science competency indicate that, in the 
vast majority of cases, these children rep¬ 
resent the low end of normalcy rather 
than specific genetic or environmental 
defects. This has no implications at all 
for policy. Setting the level at which 
schoolchildren are considered for special 
education, in any field, has to be deter¬ 
mined by two things: the level of com¬ 
petency the child requires in order to 
function in the society, and the money 
available to enroll students in special edu¬ 
cation programs. 


Why a particular child falls into the 
defined class is of interest only if it serves 
as a guide to treatment, to tell the teacher 
how to fix the problem. The finding that 
the special class is actually the low end of 
the normal distribution is encouraging, 
because it indicates that the special edu¬ 
cation class will be reasonably homoge¬ 
neous, so that uniform teaching methods 
may apply widely. What those methods 
should be is a topic beyond the current 
discussion. 

* Nisbett, 2009, provides an extensive list of exam¬ 
ples and discusses the difficulty of implemen¬ 
tation. 


and heritability often exceeded total envi¬ 
ronmental influences. 

Genetic correlations were high, indicat¬ 
ing that substantially the same genetic influ¬ 
ences were being expressed in all topics. 
This is consistent with QBG analyses of 
psychometric tests [as discussed earlier); 
genetic influences appear to express them¬ 
selves more on g than on specific cogni¬ 
tive skills. The environmental influences 
appeared to be unique to each of the three 
skills - reading, math, and science. Environ¬ 
mental effects appeared to control devia¬ 
tions from a genetically related stable path, 
either upward or downward, rather than 
individual differences in the average per¬ 
formance of a child, across time. As has 
been the case with studies of genetic influ¬ 
ence on intelligence test scores, the envi¬ 
ronmental contribution was largely due to 
nonshared environmental differences, envi¬ 
ronmental differences between twins or sib¬ 
lings, rather than across families. 

Educators [and parents) have long 
debated whether particularly poor school 
performance is an indication that a child 
has a specific genetic anomaly or whether 
children toward the bottom end of the 
performance scale have general educational 
deficits. To shed light on this question the 
TEDS researchers asked whether the mod¬ 
els that applied to all children in the study 


also applied to those in the lowest fifth 
or fifteenth percentiles. In the American 
context, low-performing children are often 
assigned to “special education” classes. The 
reduced samples still contained several hun¬ 
dred cases. In general, the same models that 
applied to the entire sample applied to the 
reduced samples. 

This suggests that poor academic perfor¬ 
mance is generally the result of an overall 
cognitive deficit, rather than a specific prob¬ 
lem. It is true that some cases of impaired 
learning skills have been identified with cer¬ 
tain genomes. 46 However, the known spe¬ 
cific disabilities account for only a small frac¬ 
tion of cases of poor reading or mathematics 
performance. 

8.3.6. Summary Comments on 
Quantitative Behavior Genetics 

Anyone familiar with the “now you find 
it, now you don’t” phenomena that plague 
the social sciences has to be struck by the 


46 Willcut et al., 2004. An example of such specific 
genetic anomalies is the FOXP2 gene, which was 
associated with very poor linguistic performance in 
a pedigree study of a single British family (Lai et al., 
2001). The relevant allele was not found in any of 
the 270 cases of poor language performance in the 
TEDS study (Kovas et al., 2007, p. 108). 
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consistency of the behavior genetic findings. 
To review briefly: 

1. Additive genetic heritability accounts 
for 40-80% of the variance in virtually 
all cognitive traits. 

2. Somewhere around half of the variance 
is common to all traits. This strongly 
suggests that there are generalist genes 
that influence brain structures underly¬ 
ing virtually all cognitive performance. 

3. Environmental effects (including the 
prenatal environment) are strongest in 
early childhood. They diminish there¬ 
after. 

4. Childhood environmental effects are 
primarily due to within-family differ¬ 
ences - the way a child interacts with 
his or her familial environment - rather 
than being due to between-family envi¬ 
ronmental differences. Nonshared envi¬ 
ronmental influences are also the largest 
environmental influences in adults, but 
this now refers to life experiences 
that differentiate twins and siblings, 
rather than (in most cases) to family 
environments. 

5. While genetic influences are strong in 
establishing the trajectory of a person's 
cognitive development and the later 
reduction of cognitive ability in old age, 
variations from that trajectory are influ¬ 
enced by environmental factors. 

There are a few caveats that must be kept 
in mind. I will present them in inverse order 
of their importance, from least serious to 
most serious. 

Twin studies loom large in the QBG 
study of intelligence. Heritability estimates 
from such studies are heavily influenced 
(and in some cases dictated) by the contrast 
between DZ and MZ twins. The rough and 
ready method of estimating the heritability 
of trait X as 2 (r M z(X, X') - r DX (X, X')) is 
a good example. This contrast is based on 
the assumption that MZ twins have identi¬ 
cal geneotypes, which they do, and that DZ 
twins have half of the permissible human 
variation: rMz(G, G) = 1 and roz(G, G 7 ) = 
V 2 . The assumption about MZ twins is unar¬ 
guable. The assumption about DZ twins 
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depends on the further assumption that the 
parental genetic correlations are zero; there 
is no tendency for males and females to 
select mates of similar genetic background. 
This may be true in fruit flies and fish, but it 
is not in humans. The correlation between 
cognitive test scores for spouses generally 
runs in the .2-3 range. This tendency is 
called assortative mating. 

As yet, we do not have a good idea of 
the extent to which assortative mating has 
to be considered. The topic is a difficult one 
to study, because the analysis requires test 
scores from both parents and children. Nev¬ 
ertheless, some studies are being done, and 
hopefully more will occur in the future. 47 

Most of the analyses used to deter¬ 
mine heritability do not allow for gene- 
environment interactions or correlations. 
The failure to allow for interactions is a 
major deficiency of present QBG ap¬ 
proaches. Why is it a problem, and why 
haven't QBG analyses allowed for it? 

We know gene-environment interactions 
exist, for we can point to some. Many genet¬ 
ically influenced traits, such as alcoholism, 
involve a potential for a pathology that is 
released by environmental agents. As we get 
a better understanding of gene-environment 
interactions they are likely to play a larger 
role in our understanding of the inheritance 
of intelligence. Gene by environment cor¬ 
relations may also be very important. Sev¬ 
eral studies have reported higher heritability 
coefficients in middle and upper SES pop¬ 
ulations than for lower SES populations. 48 
This is in itself an interaction in the statisti¬ 
cal sense. But why would this occur? Several 
mechanisms have been postulated. 

One argument is that harsh social envi¬ 
ronments will restrict the expression of 
genetic potential, and that therefore genetic 
variation can express itself only in an envi¬ 
ronment that nurtures cognition 49 There is 
an analogy to weight: in a famine-stricken 
town everyone is emaciated; when food is 

47 See Colom, Aluja-Fabregat, & Garcia-Lopez, 2002; 

Reynolds, Baker, & Pederson, 2000; van Leeuwen, 

Van den Berg, & Boomsma, 2008. 

48 Harden, Turkheimer, & Loehlin, 2007; Rowe, 2004; 

Turkheimer et al., 2003. 

49 Bronfenbrenner & Ceci, 1994. 
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available genetic tendencies toward obesity 
can express themselves. Substitute “getting 
something out of education” for weight and 
“bad schools” for famine and you have the 
idea. 

A second argument is that society con¬ 
tains a number of positive feedback mecha¬ 
nisms that serve to increase variation, espe¬ 
cially in the high range of the scale. Jim 
Flynn, a professor of political science at 
the University of Otago, New Zealand has 
been a particularly strong advocate of this 
hypothesis. 50 One of his frequently used 
analogies is to basketball. Initially a tall, 
strong young person may be singled out for 
special playing and coaching opportunities. 
The better the player, the more coaching the 
player gets, thus increasing the variance in 
skill between the initially mediocre and ini¬ 
tially somewhat better players. Applied to 
cognitive skills, this suggests that if a school 
system practices “streaming” talented and 
untalented students onto different tracks, 
with different qualities of instruction, one 
can anticipate a rich-get-richer, poor-get- 
poorer phenomenon, in both basketball and 
intelligence. Initial, genetically produced 
talent is nurtured by the environment. 

Both these explanations of the observed 
changes in h z with environment rest on 
interactions between initial talent and 
opportunities to develop talent. There is a 
third argument that is equally reasonable 
but has an entirely different basis. The issue 
may not be that higher SES environments 
permit the expression of greater genetic 
effects; it may be that lower SES environ¬ 
ments have greater environmental ranges in 
variables that are relevant to intelligence. To 
put things graphically, the cognitive benefits 
of sending a child to a good, average middle 
class preschool, or even of not sending the 
child to preschool at all, but spending a lot of 
time interacting with the child at home, may 
not be very far behind the benefits of send¬ 
ing a child to an exceptionally expensive 
preschool. The differences between keeping 
children in a chaotic atmosphere, with very 
little encouragement to explore or express 
themselves, compared to sending the same 

50 Dickens & Flynn, 2001; Flynn, 2007. 


children to a modestly funded preschool 
three or four days a week may be profound. 
This sort of effect is not a gene-environment 
interaction. The effect is due to greater envi¬ 
ronmental variation in the lower SES envi¬ 
ronment compared to the higher SES 
environment. 

I find these arguments for gene- 
environment interactions and correlations 
compelling. However, I see very little hope 
of untangling the effects by the use of 
structural equation models that make gross 
assumptions about how the environment 
covaries between people of different degrees 
of genetic relatedness. We need direct mea¬ 
sures of the environmental variables that 
influence cognition. Without a theory of 
environmental action it is hard to know 
where to begin. 

And that is the rub. In an article defend¬ 
ing much of the research on twin stud¬ 
ies and, by implication, most other QBG 
studies of intelligence, Bouchard 51 pointed 
out that most of the claims that addi¬ 
tive heritability coefficients are due to hid¬ 
den gene-environment interactive effects are 
arguments about what might be the case. 
The arguments are sometimes accompanied 
by analogies to gene-environment interac¬ 
tions in plants, but seldom if ever by an 
example involving human intelligence. In 
this respect, Flynn’s apocryphal basketball 
example is interesting, but an actual exam¬ 
ple involving human intelligence would 
have been more convincing. 

Bouchard made two arguments to 
counter the interactionists. Both have to 
do with the strategy one follows in scien¬ 
tific research. First, he observed that many 
of the arguments appealed to what ought 
to have been measured in various studies. 
My own concerns, as just expressed, are of 
this nature. Any result can be explained by 
appeal to unmeasured variables. 52 Second, 
scientists generally accept a principle known 
as Occam's razor. Given two equally accu¬ 
rate explanations of the same phenomenon, 
one should always prefer the simpler one. 

51 Bouchard, 1997. 

52 1 wish I could take credit for this statement, but I 

cannot. I heard Bouchard say it in a public meeting. 
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QBG studies have shown that the assump¬ 
tion of substantial additive genetic variance 
can account for a great deal of data on 
the covariation of intelligence across peo¬ 
ple of different degrees of genetic related¬ 
ness. Therefore; the simple additive model 
deserves to be number one in a competi¬ 
tion between theories. Until someone comes 
along with data to back up the more com¬ 
plicated interaction models. 

QBG tells us how much heritability there 
is for intelligence within a given population. 
The only thing it says about the mechanism 
of inheritance is an appeal to the abstract 
Mendelian concept of a gene. To go further 
we have to ask what a gene is and how it 
influences the development of the brain. 

8.4. The Molecular Genetics of 
Intelligence 

Mendel established the logical basis of 
genetic inheritance; but knew nothing of 
its physical mechanism. That is the topic of 
molecular genetics, a field that started at the 
beginning of the twentieth century with 
the discovery that the chromosomes were 
the bearers of the genetic material. Dis¬ 
covery after discovery followed (along with 
several Nobel Prizes); the key one being 
James Watson and Francis Crick’s discovery 
of the structure of the genetic material; 
deoxyribose nucleic acid (DNA); in 1953. 53 
Genetic inheritance turns out to be a 
very complicated process. Fortunately a 
simplified model of the process is sufficient 
for our purposes. 54 

The material in the chromosome is a 
DNA molecule ; which has the structure of a 
double helix, two strands of material twisted 
around each other. The strands are made 
up of four structures called bases. The four 
bases are adenine (A), thymine (T); guanine 
(G); and cytosine (C). The strands of the 
helix are bound together because the bases 
bind to each other, forming units called base 
pairs. The binding is unique. An A always 

53 Watson & Crick, 1953. 

54 Watson (2003) presents a well-written history and 

discussion of the twentieth-century discoveries. 
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pairs with a T, and a C with a G. Logi¬ 
cally the DNA molecule can be thought of 
as a sequence disregarding its helical struc¬ 
ture. The sequence constitutes a person's 
genome. This is why the term "sequencing 
the genome" is sometimes used to refer to 
the process of identifying a genome. 

The bases form triplets, sequences of 
three bases. As there are four possible bases, 
there are sixty-four possible triplets. These 
are called codons. Each of the sixty-four pos¬ 
sible codons either marks the starting and 
stopping points of the codon sequence that 
makes up a gene, or specifies the creation 
of one of twenty amino acids, which are 
the building blocks of enzymes and pro¬ 
teins out of which cells, and hence life, are 
constructed. Therefore, a protein-encoding 
gene is, physically, the sequence of codons 
between a “starting” and “stopping” codon. 
The sequence of bases within this inter¬ 
val provide the program for cellular mecha¬ 
nisms that initiate the construction of one or 
more proteins, just as the magnetic charges 
on a digital video disk (DVD) provide the 
program that initiates the display of pictures 
and playing of music on a modern television 
set. Since Watson and Crick's discovery, one 
of the major tasks of behavior genetics has 
been to find out how this program is read to 
produce proteins, then cells, and eventually 
an organism. 

Variations in the sequence of bases within 
a gene define alleles, and hence variations 
in the program for building the organism. 
This provides the variation that evolution 
requires. 

Figure 8.12 shows how genes are passed 
on from one generation to the next. The 
figure illustrates meiosis, the process of con¬ 
structing a chromosome carried by a gamete 
(sperm cell in males, ovum in females). 
Recall that (for autosomes) there are two 
chromosomes. The two chromosomes link 
at a connecting point. A new chromosome 
is formed, containing a segment from each 
of the parental chromosomes. Therefore, 
base pairs that are located close to each 
other on a chromosome tend to be inherited 
together. In fertilization the chromosomes 
in the sperm and ovum unite, so that the 


246 


HUMAN INTELLIGENCE 



Figure 8.12. The reshuffling of genetic material in meiosis. Each 
chromosome in a chromosome pair can be thought of as a sequence 
of base pairs. In meiosis the two chromosomes link together at some 
midpoint, and then a new chromosome is constructed from one 
segment of each of the parental chromosomes. Therefore, base pairs 
that are close to each other on a chromosome will tend to be 
inherited together. 


chromosome pair in the offspring contains 
one chromosome from each parent. 

The process is somewhat different in the 
sex chromosomes. In females, who have two 
X chromosomes, the shuffling takes place 
as in the autosomes. In males, who have 
an X and Y chromosome pair, either the 
X or Y chromosome is carried over into a 
sperm cell. Accordingly, female zygotes con¬ 
tain two X chromosomes, one from each 
parent, but male zygotes contain a mater¬ 
nal X and a paternal Y chromosome. The Y 
chromosome represents a direct transfer of 
genetic material along the patrilineal line. 
While a male’s X chromosome is always 
inherited from the mother, that chromo¬ 
some might have come from either mater¬ 
nal grandparent, so an analogous continuous 
transfer of genes through the female line is 
not possible. 55 

The white and black dots in Figure 8.12 
indicate base sequences. The black dots indi¬ 
cate sequences that make up the protein- 

55 There is a way to trace female lineage. Mitochon¬ 
dria are organelles within a cell, but outside the 
nucleus. They contain their own DNA, mtDNA, for 
a total of thirty-seven genes, plus segments of non- 
coding DNA. MtDNA is inherited from the mother, 
so analysis of mtDNA provides a genetic record of 
female lineage. 


encoding genes. The white dots indicate 
base sequences outside of the genes. The 
totality of the base sequences, across all 
chromosomes, constitutes the genome. 

About 90% of the genome consists of 
sequences of bases that lie outside of 
the protein-encoding genes. There are also 
sequences of bases within the protein¬ 
encoding genes that do not code for proteins. 
At one time these sequences were thought 
to be inert and were referred to as "junk 
DNA." We now know that at least some of 
the sequences are active, because they regu¬ 
late the expression of the protein-encoding 
genes. In addition, some genes regulate the 
expression of other genes. 

There are many statements like "We share 
50% of our genes with each of our par¬ 
ents." These statements are a bit mislead¬ 
ing. The 50% figure refers to variation in 
the sequence of base pairs that can differ 
across human beings, and still produce a 
person. That is a small percentage of the 
total genome; roughly 99.9% of the genome 
is identical across humans, and we share 96% 
of the genome with our evolutionary cousin, 
the chimpanzee. 

Nevertheless, there are differences. A 
difference that occurs at a base pair in 
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Panel 8.9. A Few Stray Facts About 

the Genome 

Here are a few facts about the genome. 

1 . There are approximately three bil¬ 
lion (3,000,000,000) base pairs in 
the human genome. However, the 
number of protein-encoding genes 
is much smaller. The current (2010) 
estimate is between 25,000 and 
30,000. This estimate is well below 
the estimates of too,000 or more that 
were common as late as the year 2000. 

2. Each gene specifies an average of 
three proteins. Different proteins 
may be expressed in different parts 
of the body. It has been estimated 
that about one-third of human genes 
have some expression in the brain. 

3. A mutation occurs when a new 
allele arises. Some mutations arise 
due to errors in the process of 
DNA replication. Mutations can also 
be triggered by environmental haz¬ 
ards. These include nuclear radia¬ 
tion, ultraviolet light, and exposure 
to certain chemicals. Discouragingly, 
some of the hazardous chemicals are 
used in industrial processes. Finally, 


some mutations appear to occur “by 
chance,” which simply means that 
we do not know what the causative 
process is. 

4. Mutations are probably more fre¬ 
quent than we realize, for a great 
deal of the genome’s programming 
specifies basic processes in the cell. 
In such cases pregnancy either does 
not begin or is terminated without 
detection. In other cases termination 
occurs later, with a detectable preg¬ 
nancy, or a stillbirth occurs. Some 
mutations are viable, and can even 
produce superior individuals in some 
environments. That is how evolution 
works! 

5. Some genes regulate the expression 
of other genes in the body. In addi¬ 
tion, some of the (formerly believed 
to be) junk DNA lying outside 
the boundaries of protein-encoding 
genes serve as regulators of genetic 
expression. Microenvironmental fac¬ 
tors within the cell can also control 
genetic expression. 

6. A genetic influence may not be 
expressed for some time after birth, 
or in some cases may be suppressed 
throughout the individual’s life. 


at least 1% of the population is called a 
single nucleotide polymorphism (SNP or 
“snip”). There are about ten million SNPs 
in humans. Blocks of SNPs that are close 
together on the same chromosome, and so 
tend to be inherited together, are called hap- 
loids. The SNPs themselves may be in a 
protein-encoding gene, in a region between 
genes, or may be part of the nonencoding 
portion of a gene. One of the major steps 
in research on genetic differences involves 
locating haploids that differ between two or 
more populations of interest, such as indi¬ 
viduals who do or do not have a particular 
form of mental disability. 

Panel 8.9 presents a few more facts about 
the genome. The next two sections discuss 


some of the many genetic disorders of 
cognition. 

8.4.1. Genetic Pathologies of Intelligence 

A person is considered to be mentally dis¬ 
abled if his or her intelligence, as measured 
by standard tests, falls below an IQ of 70. 
This is obviously an arbitrary standard, for 
a great many people who are considered to 
have a mental disability have IQ scores in 
the 70-80 range. From the behavior genet¬ 
ics viewpoint, the mentally disabled fall into 
three distinct groups. 

One group consists of people who have 
a cognitive deficiency with a known genetic 
cause. The mental deficiency is often only 
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part of the syndrome, which may often 
include other organic problems. There are 
approximately 300 such syndromes known 
at present, and no doubt more will be dis¬ 
covered. In some cases there is a general 
mental deficiency; in others deficiencies may 
be unevenly distributed, although some gen¬ 
eral deficiency is usually found. The genes 
that have been identified as causes for these 
deficiencies are found on several different 
chromosomes. However, the rate of occur¬ 
rence on the X chromosome is higher than 
would be expected by chance. 56 

Most of the genes associated with severe 
mental retardation are recessive. Accord¬ 
ingly, a man can suffer ill effects if he inher¬ 
its a single copy of the allele associated with 
the pathology. Women, with two X chro¬ 
mosomes, have to inherit two copies of the 
allele in order to be affected. 

The following examples of known genetic 
disorders have been chosen to illustrate the 
variety of syndromes that can occur. One 
of the examples is expressed in infancy, one 
in infancy and childhood, another in mid¬ 
dle age, and the last is usually not expressed 
until old age. In the first three examples the 
genetic anomaly is both sufficient and neces¬ 
sary for the disorder to occur. In the fourth 
example the genetic anomaly increases the 
risk of the cognitive disorder, but is neither 
a sufficient nor a necessary condition for the 
disorder. 

Phenylketonuria (PKJJ). This is one of 
the best understood of the genetic mental 
deficiencies. PKU is due to a mutation in 
the PAH gene located on chromosome 12. 
PAH codes for synthesis of phenylalanine 
hydroxylase, an enzyme that is important 
in the metabolism of phenylalanine, a sub¬ 
stance that is found in certain foods, includ¬ 
ing artificial sweeteners. Untreated PKU can 
result in severe mental retardation. Fortu¬ 
nately the condition leading to PKU can be 
detected at birth, by an inexpensive blood 
test. Affected individuals have to maintain a 
restricted diet for a number of years. There 
are some indications that older children and 
adults who cease to follow the diet may be 

56 Inlow & Restifo, 2004. 


at risk for mild mental retardation, or at least 
a lower tested intelligence level than would 
be expected otherwise. In the extreme case 
a pregnant woman who is a carrier of the 
mutation may affect her child, not necessar¬ 
ily by passing on the gene, but by the harm¬ 
ful effects of a buildup of phenylalanine in 
her body during pregnancy. 

Estimates of the frequency of PKU range 
from 1 per 13,500 to 1 per 19,000 births, 
varying across ethnic groups. Screening and 
treatment for PKU has been widespread 
since the 1960s, so the clinical form of the 
condition is now fairly rare in the industri¬ 
ally developed countries. This makes PKU a 
prototypical example of a gene-environment 
interaction. It is a disaster for children born 
without access to modern health care, but 
a treatable problem for children who have 
access to health care. 

Fragile X syndrome. People suffering from 
fragile X syndrome display impulsiveness, 
difficulties in concentration, and, possibly 
concomitantly, some degree of mental retar¬ 
dation. The condition affects 1 in 4,000 men 
and 1 in 8,000 women. It is due to a muta¬ 
tion in the FRMi gene on the X chromo¬ 
some. This gene is involved in production of 
a protein that is important in, among other 
things, nerve synthesis. 

Molecular geneticists have been able to go 
inside the gene, pointing to the manner of 
mutation that influences the gene's expres¬ 
sion. In normal individuals the CGG codon 
occurs in a block of from ten to forty rep¬ 
etitions within the FRMi gene. In affected 
individuals the block may contain from 200 
to 1,000 repetitions. The repetitions block 
the gene's normal production of the protein. 

Huntington's disease. Huntington’s dis¬ 
ease was discussed earlier as an illustra¬ 
tion of a pathological gene that does not 
express itself until after a person reaches 
or passes reproductive age. See the discus¬ 
sion of Woody Guthrie and his family, pre¬ 
sented in panel 8.1. The syndrome is caused 
by a dominant allele on chromosome 4, so 
affected individuals need have only one copy 
of the gene in order to express symptoms. 

In general, if a pathological allele is 
dominant it quickly disappears from the 
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population. Huntington’s disease represents 
an insidious situation. The pathology does 
not manifest itself until after the affected 
individual is well into his or her reproduc¬ 
tive period. Therefore, the disease gene can 
be passed on to the next generation before 
the illness appears. 

As is the case with fragile X syndrome, 
the pathological allele has an abnormal num¬ 
ber of repeats of a codon. Increasing num¬ 
bers of repeats are associated with earlier 
onset of the syndrome. There is also evi¬ 
dence that the length of the repeat increases 
over generations, so if a child of a person 
with Huntington’s disease carries the allele 
he or she is at risk for developing symptoms 
earlier than the parent did. 

Alzheimer's disease . 57 Alzheimer’s disease 
is a common and feared disease of the 
elderly. As of 2008, estimates of the num¬ 
ber of people suffering from Alzheimer’s 
dementia in the United States ranged from 
4.5 to 7 million, out of a total population 
of just over 300 million - about 2% of the 
population. As the age distribution shifts 
toward higher percentages of elderly peo¬ 
ple, the incidence of Alzheimer’s dementia 
will increase. If present incidence rates con¬ 
tinue there will be eleven to sixteen million 
cases of the disease in the United States by 
2050. Progress toward effective treatment or 
prevention has been disappointingly slow. 
Research on this economically and socially 
devastating disease is being given very high 
priority. 

The disease comes in two forms. Early 
onset Alzheimer’s is defined as Alzheimer’s 
disease that manifests itself prior to age 65. 
Cases have occurred as early as the mid for¬ 
ties. The more common variety, late onset 
Alzheimer’s disease, appears after age sixty- 
five. The initial symptoms are failures of 
short-term memory and attention, progress¬ 
ing to profound loss of long-term memory, 
including loss of the ability to recognize 
spouses and relatives known for fifty years or 

57 Information on symptoms, causation, and incidence 

downloaded from the Alzheimer’s Association web¬ 
site and the National Institute of Aging website, 

March 2008. 
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more, loss of speech, and eventually death. 
Because intensive care may be required for 
a period of years the financial and emotional 
impact on caregivers can be very high. 

The proximal cause of the memory loss is 
widespread deterioration of the nerve cells 
in the brain, beginning with the frontal cor¬ 
tex and the hippocampus. This is not sur¬ 
prising, as these structures are important for 
the development of new memories. Neural 
deterioration is accompanied by the forma¬ 
tion of beta amaloid protein plaques in the 
brain. 

Three genes have been identified as in¬ 
creasing the risk of early onset Alzheimer’s, 
but they are not sufficient to account for all 
cases of the disease. 

Late onset Alzheimer’s has been associ¬ 
ated with several genes, among them the 
Apolipoprotein E (APOE) gene on chromo¬ 
some 19. The gene has three alleles, APOE2, 
APOE3, and APOE4. About half the pop¬ 
ulation carries the APOE2 allele. The high- 
risk form of the allele, APOE4, is relatively 
common (15% of the European-derived pop¬ 
ulation in the United States). About 50% 
of diagnosed Alzheimer’s patients carry the 
APOE4 form, but less than half of the peo¬ 
ple who carry APOE4 express the disease. 
For those who are affected, the age of onset 
is related to the genetic load. Patients with 
genotype E4/E4 have a mean onset of sixty- 
eight years of age, patients carrying one 
E4 allele have a mean onset of seventy-five 
years, and patients with no E4 alleles have a 
mean onset of eighty-four years. 

None of the genes linked with either type 
of Alzheimer’s disease has a strong enough 
association to be considered the sole, or even 
a necessary, cause of the disease. Identify¬ 
ing a person’s genotype makes it possible 
to assess the risk of developing the demen¬ 
tia, but we cannot say “for certain" that the 
dementia will or will not occur. 

A number of environmental factors 
have been linked (somewhat tenuously) to 
Alzheimer’s disease. These include severe 
inflammations of the brain and head injury. 
It has been suggested that industrial air 
and water pollutants may increase the risk 
of the disease. This has yet to be proven. 
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Such a multiplicity of possible causes is 
not surprising. The proximal mechanism of 
deterioration is neural degeneration, pos¬ 
sibly due to the development of protein 
plaques in the brain, degeneration of the 
synapses, and other insults to the brain itself. 
The deterioration can probably be initiated 
by several environmental and genetic haz¬ 
ards, which would act as distal causes. 

Alzheimer's disease represents a common 
situation, in which both genetic and envi¬ 
ronmental factors increase the risk of expres¬ 
sion of a pathology, but there does not 
appear to be any one factor that is either 
sufficient or necessary for the expression. 

8.4.2. Mental Pathologies Related 
to Chromosomal Abnormalities 

Several mental disabilities are caused by 
anomalies in the duplication of chromo¬ 
somes, rather than by variations in genes. 

Down’s syndrome: Down's syndrome is 
characterized by poor physical develop¬ 
ment, mild to moderate mental retardation, 
and facial features that include elliptically 
shaped eyelids. It is fairly common, occur¬ 
ring in approximately 1 of every 800 births. 
The syndrome is caused by the presence of 
an extra chromosome 21. Down's syndrome 
sufferers usually die at a fairly early age, as 
the syndrome includes defects in the cardio¬ 
vascular system as well as cognitive prob¬ 
lems. Adult Down’s syndrome patients dis¬ 
play premature aging, including the physical 
deteriorations associated with advanced age 
and early Alzheimer's disease. 

Relatives of a person with Down’s syn¬ 
drome are at risk of producing a Down’s 
syndrome child. The risk is strongly related 
to the mother's age at birth. The risk rises 
from less than 1 in 1,000 births to women 
under thirty years old to 1 in 100 for women 
over forty. 

Turner's syndrome.^ Turner's syndrome 
is a condition that affects women born with a 
single X chromosome. Turner’s patients dis¬ 
play poor spatial-perceptual reasoning, the 
P and R components of the VPR model. 

58 Ross, Zinn, & McAuley, 2000. 


Verbal reasoning is unaffected. Affected 
women may also display difficulties with 
tasks that place high demands on the com¬ 
plex of working memory and attentional 
control, the behaviors labeled “executive 
functioning.” Physical signs may appear. 
These include short stature, in some cases 
markedly thickened necks, and late and 
sometimes incomplete development of sex¬ 
ual characteristics. The women are also at 
risk for cardiovascular problems. All of these 
remarks have to be qualified by the fact of 
individual variation. 

Turner's syndrome cases vary greatly. 
Some of these differences are tied to envi¬ 
ronmental variables. Girls who receive both 
growth hormone and estrogen replacement 
therapy may reach heights in the normal 
range and may be free of the physical signs 
associated with the syndrome. Support¬ 
ive within-family social environments are 
important. Estrogen therapy appears to be 
effective in countering deficiencies in exec¬ 
utive control functions, although it does not 
counter the deficiency in spatial-perceptual 
reasoning. This is interesting, because dif¬ 
ferent brain structures are involved in exec¬ 
utive control and spatial-visual reasoning. 
Clinically, the result is encouraging because 
a great many of our everyday spatial- 
perceptual tasks can also be accomplished 
using verbal strategies. 

Turner's syndrome is, like PKU, an exam¬ 
ple of a gene-environment interaction. A 
baby girl born today with Turner's syndrome 
has a far better outlook than she would have 
had one hundred years ago, providing that 
she has been born into a social environment 
that can provide the necessary support. 

XYY syndrome. Some males are born with 
two Y chromosomes. The genotype is asso¬ 
ciated with mild mental retardation. XYY 
males are typically large and fairly robust. 
During the 1960s it was suggested that XYY 
males are overly aggressive, to the point of 
exhibiting dangerous criminal behavior. The 
evidence for this was that there is an ele¬ 
vated incidence of XYY men among crim¬ 
inal populations. 59 However, it does not 

59 Jarvik, Klodin, & Matsuyama, 1973. 
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follow from this observation that XYY men 
are likely to become criminals. Studies of 
XYY’s in prison populations can establish 
the conditional probability of being XYY, 
given that the individual is known to be a 
(tall] criminal, Pr[XYY\crime ), but that is 
not the same thing as establishing the prob¬ 
ability of being a criminal, given that the 
individual is known to be an XYY. The lat¬ 
ter figure, Pr(crime\XYY), is the statistic that 
would have to be high in order to justify 
proactive monitoring of XYY cases. This is 
difficult to establish, as the condition occurs 
only in approximately 1 in 1,000 male births. 

Two small prospective studies have been 
done in which XYY’s were identified and 
then followed for some time. 60 In both stud¬ 
ies the incidence of criminality among XYY 
men was elevated above the general popu¬ 
lation but not above the population of men 
of comparable IQ. An analysis of the types 
of crimes committed indicated that that ele¬ 
vated criminality may be mediated by low 
intelligence levels, rather than heightened 
aggression. 

Klinefelter’s syndrome. Klinefelter’s cases 
are men who have genotype XXY. Kline¬ 
felter’s syndrome cases are fairly large men, 
although they are not unusually robust. 
The commonest cognitive characteristic is 
poor language development, along with 
an elevated incidence of mild to moder¬ 
ate mental retardation. This is consistent 
with an fMRI study that revealed decreased 
hemispheric specialization during language 
processing among Kleinfelter’s patients, as 
compared to control groups. 61 Sexual devel¬ 
opment is retarded. There is some indica¬ 
tion that hormone replacement therapy can 
ameliorate the symptoms of this condition. 62 

Various types of mental retardation affect 
slightly under 3% of the population. This 
estimate excludes people with Alzheimer’s 
disease, which is usually considered a prob¬ 
lem associated with old age rather than 
a mental disability. The mentally disabled 

60 Gotz, Johnstone, & Ratcliffe, 1999; Witkin et al., 

i 9 7 6. 

61 Van Rijn et al., 2008. 

62 Hazlett, 2005. 


impose a substantial burden on our social 
support systems. Finding techniques for 
ameliorating or effectively eliminating the 
consequences of these genetic anomalies, 
as we have with PKU, would have major 
moral, social, and economic rewards. 

8.4.3. Tt te Genetic Basis of Normal 
Variability in Intelligence 

The molecular genetic analysis of normal 
variation in intelligence presents a very dif¬ 
ferent picture than the analysis of patholog¬ 
ical conditions. The first thing to say, and 
to say loudly, is that there is no one gene, 
or even a small number of genes, responsi¬ 
ble for normal variations in intelligence. If 
there were one, we would have found it by 
now, for the techniques that have been used 
are quite adequate to identify any gene that 
accounted for 30% or more of the variation 
in intelligence within the normal range. No 
such gene has been found, so we may be 
pretty sure that it is not there. 

We are clearly dealing with a poly¬ 
genic inheritance model; lots of genes have 
their influence, but no one of them is 
the key gene. One of the most prominent 
researchers in the field, Robert Plomin, has 
speculated that variation in any one gene 
will control at most .5% of the variance in 
intelligence, as expressed in test scores. 63 
This may be a bit pessimistic, for researchers 
have found at least two genes, the CHRM2 
gene on chromosome 2 and the SNAP-25 
gene on chromosome 20, where variations in 
alleles may account for as much as 3-4% of 
phenotypic variation. 64 However, such find¬ 
ings have to be replicated. And here is why. 

Population-wide studies of genetics assess 
covariances between allele frequencies 
and phenotypical variations simultaneously, 
across many polymorphisms in the geno¬ 
type. Because very many polymorphisms are 
being studied simultaneously, there is a sub¬ 
stantial chance of finding polymorphisms 
that, by chance, happen to covary with 
the phenotypical measure in the sample, 

63 Plomin, Kennedy, & Craig, 2006. 

64 Dick et al., 2007; Gosso et al., 2006. 
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even though there is no covariation between 
the allele frequencies and the phenotypical 
measure in the population. Such a finding is 
called a false positive. 

The converse problem is false negatives. 
Suppose that Plomin’s conjecture is right, 
that the covariation between allele frequen¬ 
cies in any one gene and intelligence test 
scores accounts for less than 1% of the total 
variation in test scores. Unless the sample is 
very large the chances of detecting a small 
covariation are not good. 

The false positive and false negative 
effects play off against each other. Any 
widespread screening study maximizes the 
chances of finding false positives. When it 
comes to replication, though, a small effect 
can dip below the level of detectability in 
the replication sample. This makes the busi¬ 
ness of gene locating both time-consuming 
and expensive. That is the way it is. 

8.4.4. Techniques for Identifying Genes 
Associated with Normal Intelligence 

Several techniques have been used to iden¬ 
tify genetic contributions to intelligence. 
Some of the technologies are described in 
panel 8.10. Here I look at the logic rather 
than the mechanics of any one method. 

In a “bottom up” technique a gene is iden¬ 
tified that is known to be associated with 
some physiological function that might, rea¬ 
sonably, be linked to intelligence. A study 
is then initiated in which people are given 
some form of intelligence test and have 
their genotypes determined. This was the 
method used to identify the CHRM and 
SNAP genes. Both are believed to be linked 
to the development of an efficient neural 
system. 

Sometimes this method works out. But 
sometimes it does not. For instance, we 
know that brain volume is related to intel¬ 
ligence, and that the connection is very 
largely genetic. 65 Accordingly, it is not sur¬ 
prising that there was considerable excite¬ 
ment when it was found that variations 
in the allele frequencies of two genes, 

65 Posthuma, De Geus, & Boomsma, 2004. 


MICROCEPHALIN (chromosome 8) and 
ASPM (chromosome 1], both known to 
be involved in pathological failures of the 
development of brain size, exhibit evidence 
of strong evolutionary pressures and, in addi¬ 
tion, have a worldwide distribution suggest¬ 
ing that the pressure occurred about the 
times of the migration of Homo sapiens 
out of Africa and, much later, the devel¬ 
opment of agriculture. 6n However subse¬ 
quent research showed that allele variations 
in these genes were not reliably associated 
with intelligence, within the normal range, 
or, for that matter, with head size and brain 
volume within the normal range. 67 

Failures to replicate or to show hypoth¬ 
esized correlations do not uncover bad sci¬ 
ence in the original studies. Given the sta¬ 
tistical issues involved, the only way for the 
field to proceed is for scientists to publish 
their findings, and then call for replication. 

An alternative strategy for finding the 
genetic basis of intelligence is to start from 
the "top down," by identifying individuals 
who differ in intelligence and then contrast¬ 
ing their genotypes. This is called genome¬ 
wide association. One such study compared 
DNA from children who had been identified 
as showing a very high level of mathemat¬ 
ical ability to DNA from a normal control 
group. The two groups differed in the fre¬ 
quency of an allele of a gene associated with 
the synthesis of insulin. 68 

Because many DNA sequences are being 
screened at once, genome-wide association 
is prone to picking up false positives. The 
only feasible way to counteract this is to 
replicate findings. And, alas, when this 
particular study was replicated the allele- 
intelligence correlation was not found. 69 

Linkage analysis is usually applied in con¬ 
junction with the study of related individu¬ 
als - the pedigree method applied to people 
rather than to plants. Linkage analyses can 
identify segments of DNA where polymor¬ 
phisms are correlated with some measure 

66 Evans et al., 2005; Mekel-Bobrov et al., 2005. 

67 Mekel-Bobrov et al., 2007; Rushton, Vernon, & 

Bons, 2007; Woods et al., 2006. 

68 Chorney et al., 1998. 

69 Hill et al., 2002. 
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Panel 8.10. Methods for Screening 
for Genetic Effects 

This panel describes designs used to 
establish a correlation between variations 
in alleles [polymorphisms] and variations 
in a phenotypic trait, such as intelligence. 

Differences between individual ge¬ 
nomes [polymorphisms ) occur when the 
sequence of nucleotides differs at one or 
more positions, as in the sequence A C G 
T A A compared to A C C T A A, where 
there is a difference between G and C in 
the third position. As the text explains, 
a variation such as this is called a single 
nucleotide polymorphism (SNP, “snip”]. 
A SNP can occur inside or outside the 
boundaries of protein-encoding genes. 
Segments of DNA that contain one or 
more SNPs are referred to as alleles of 
that segment, a generalization of the idea 
that an allele is a variant of a gene to the 
idea that an allele is a variant of a SNP. 
SNPs occur roughly every 100 to 300 base 
positions along the genetic code. 

A genetic marker is an identifiable seg¬ 
ment of the sequence of bases in the 
genome. Genetic markers are used to 
identify segments of the genome within 
each chromosome. To take an oversim¬ 
plified example, suppose that two mark¬ 
ers, Mi and M2, have been identified, and 
that in one person the sequence of DNA 
between markers is Mi A C G T A A M2 
and in the other the sequence is 
A C C T A A. There is a SNP at the 
third position. This means that there are 
at least two alleles of the genetic sequence 
between Mi and M2. 

Participants in a study donate DNA 
samples and are measured on the phe¬ 
notypic trait, perhaps by being given an 
intelligence test. The DNA samples are 
then broken up into segments between 
genetic markers. The data is examined 
for correlations between the presence of 
alleles and the values of the phenotypic 
trait. For instance, if one allele is statisti¬ 
cally associated with high IQ scores and 


another with low scores, then a genetic 
association with IQ has been located. 

The technique works best if the par¬ 
ticipants differ in their genotypes in 
some systematic way. Therefore link¬ 
age analyses are done using samples of 
related individuals. For instance, regis¬ 
trants in the Dutch twin registry and 
their relatives have participated in linkage 
analyses. 

Linkage analysis is highly useful for 
tracing the genetic basis of discrete traits, 
with restricted reaction ranges, such as 
eye color. It can be applied to contin¬ 
uously measured traits, such as intelli¬ 
gence, but there may not be enough reli¬ 
ability in measuring the phenotypic trait 
to make the technique precise. This is 
especially true if, as is the case for cog¬ 
nitive measures, expression of the trait 
is affected by environmental as well as 
genetic factors. We know that this is 
the case for intelligence, for MZ twins, 
with identical genotypes, do not necessar¬ 
ily have the same test scores. Therefore, 
another method, genome-wide analysis, is 
used with continuous traits. 

Genome-wide analysis depends on a 
technology for the detection of genetic 
expression by microarrays. A microarray 
is a small chip that is divided into cells, or 
“dots.” In the screening application each 
dot contains a sequence of DNA located 
around one of the values of a previously 
identified SNP. Therefore, the segment of 
DNA in each dot is essentially a fragment 
of the allele of a SNP carried by one indi¬ 
vidual in the sample. Chips capable of 
evaluating a million SNPs have become 
available. 

DNA samples are taken from individ¬ 
uals from one or more populations that 
vary on the trait in question. The genetic 
material from all individuals is pooled, to 
create a solution of DNA and other mate¬ 
rial. (The process, which is rather com¬ 
plicated, will not be further described.) 

(continued) 
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Panel 8.10 (continued) 

The microarray chip is then immersed in 
the DNA. Because complementary DNA 
strands will bind to each other, DNA that 
matches the DNA at a dot on the chip 


will bind to that location. The amount 
of binding at each spot is then com¬ 
pared across groups, to see if differences 
in binding can be related to group mem¬ 
bership. A case of this sort is described in 
the text. 


of cognitive competence, usually intelli¬ 
gence test scores. (The technique would 
work just as well with measures of work¬ 
ing memory, attention, or any other mea¬ 
surable trait, including such things as height 
and eye color.) The reason for using related 
individuals is that the occurrence of the 
polymorphism can be traced over genera¬ 
tions and across different relationships, in 
order to establish a correlation between the 
polymorphism and the trait of interest. To 
date, segments of DNA that covary with 
intelligence test scores have been located 
on chromosomes 2 and 6. 70 However, we 
have to wait for replication to solidify these 
results. Once this is done, further studies 
can be initiated to locate the genes within 
or near the segment, including non-protein- 
encoding DNA regulator sequences, and to 
find out what they do. 

Progress toward identifying the genes 
associated with normal variations in cogni¬ 
tion has been slow. In a review of progress 
as of 2006, Robert Plomin, Kennedy, and 
Craig pointed out that discoveries in the 
field have followed a discouraging pattern. 
Initially genes emerge as reasonable can¬ 
didates for the “intelligence gene” because 
of their involvement with neural efficiency, 
brain development, or some other key func¬ 
tion that ought to be related to intelligence. 
Interesting associations are found at first, but 
the associations do not appear on replica¬ 
tion. The fact that this pattern has appeared 
so frequently is what led Plomin to specu¬ 
late that no gene, alone, accounts for more 
than .5% of the variance in intelligence. 71 

70 Luciano et al., 2006; Posthuma et al., 2005. 

71 Plomin, Kennedy, & Craig, 2006. Plomin reiterated 

this theme in an address to the International Society 

for Intelligence Research, Amsterdam, December 

2007. 


However, from quantitative genetic studies 
we know that in the populations studied at 
least half, and possibly more, of the variation 
in intelligence is due to genetic variation. It 
seems clear that we are dealing with a very 
large number of genes, each of which has a 
small effect. Smoking slingshots, not smok¬ 
ing gunsl 

The search for the genes for intelligence 
is not a hopeless project. Solid results from 
quantitative behavior genetics show that the 
target genes exist. The technology for asso¬ 
ciating genes with traits is getting better and 
better. Ultimately, we will locate the genes 
for intelligence and will understand their 
modes of action. This will be a huge step for¬ 
ward in understanding the biological basis 
of intelligence. Unless there is a spectacu¬ 
lar breakthrough in technology, the search 
is going to take a lot longer than people 
thought it would at the start of the twenty- 
first century. 

But be of (some) good cheer. The last fifty 
years of the twentieth century and the first 
decade of the twenty-first did see spectacu¬ 
lar breakthroughs in the technology dealing 
with molecular genetics. Who knows what 
is to come? 

8.5. Reprise and Commentary 

The facts are incontrovertible. Human intel¬ 
ligence is heavily influenced by genetics. 
Specific genetic anomalies are clearly asso¬ 
ciated with identifiable mental abilities, and 
on a statistical basis substantial heritabil- 
ity coefficients have repeatedly been found 
in studies throughout industrial and post¬ 
industrial societies. The exact value for the 
heritability coefficient, h 2 , may vary some¬ 
what from study to study, but that is to 
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be expected. It is never zero, and it is 
never one. 

Hopefully, future studies will shift from 
trying to find a “true value" for this coef¬ 
ficient to determining what characteris¬ 
tics of a population determine the relative 
importance of genetic and environmental 
influences on intelligence. The heredity- 
environment debate has too often been 
posed as a contest. It is not. We need to 
find mechanisms for both. An unacknowl¬ 
edged problem that has hampered the field 
is that while we have an excellent theory of 
genetic variation, we do not have a compa¬ 
rable theory of environmental variation. It is 
easy to point to good and bad environments 
for the development of cognitive power. It 
is much harder to provide some sort of met¬ 
ric to say how good or how bad a particular 
environment is. 

The current frontier in behavior genetics 
has shifted from trying to show that intelli¬ 
gence is inherited to trying to find the mech¬ 
anism of inheritance. A good deal of suc¬ 
cess has been obtained in tracing the genetic 
causes of serious cognitive deficiencies, such 
as Huntington’s disease and PKU. We have 
had much less success in identifying the 
genes associated with variations in normal 
intelligence. Why? The answer is simple: 
this is a harder problem. It may turn out that 
the strategy of looking for gene-intelligence 
correlations directly is the wrong approach. 
An alternate approach would be to look 
for the brain mechanisms that are associ¬ 
ated with intelligence, and then try to find 
the genes that influence the development of 
these mechanisms. 

The lay public has displayed an ambigu¬ 
ous attitude toward findings on the genetics 
of intelligence. I will revisit this topic in the 
final chapter of the book, where I look at 
social controversies more generally. Here I 
make only a few brief remarks. 

Public opinion has accepted an analogy 
between physical defects and extreme men¬ 
tal deficiencies, especially when accompa¬ 
nied by physical stigmata. The lay person 
believes that it is reasonable to look for a 
physical cause for an extreme mental defi¬ 
ciency, just as it is sensible to look for 


a physical cause for, say, high cholesterol, 
which does have a genetic component. Such 
research has received strong public, politi¬ 
cal, and financial support. There is consider¬ 
ably more ambiguity about efforts to iden¬ 
tify a genetic basis for intelligence, in the 
normal range. There are several reasons for 
this. 

Modem Western society distrusts hered¬ 
itary elites. The Age of Kings is definitely 
over. On the other hand, it is demonstra¬ 
bly the case that health, wealth, and oppor¬ 
tunity for education, the trappings of high 
socioeconomic status, do pass on from one 
generation to the next. In the nineteenth 
century Galton concluded that this was 
largely a reflection of genetically endowed 
merit. Today that view is decidedly unpop¬ 
ular. Any suggestion that cognitive compe¬ 
tence is largely inherited is seen as deny¬ 
ing a popular legend - that you can be 
what you want to be, if you work hard 
enough. 

The view that Behavioral Genetics denies 
the importance of individual opportunity 
and effort is frustrating to those who believe 
in the importance of the partial genetic 
inheritance of intelligence, because no com¬ 
petent behavioral geneticist argues that 
mental competence is completely inher¬ 
ited. Behavior geneticists agree with educa¬ 
tors that socially important cognitive skills, 
which are far more important than test 
scores, can, within broad genetic limits, be 
acquired by a combination of education, 
experience, and effort. An argument for an 
inherited component of intelligence is not 
necessarily an argument against the value 
of education and effort. This point is often 
not appreciated by those who attack genetic 
models of intelligence. 

Discussion of the evidence for a genetic 
basis for intelligence has been confounded 
with discussions of differences in cogni¬ 
tive competence between racial and ethnic 
groups. Again the facts are quite clear; in the 
world today certain ethnic groups, notably 
African-derived populations, are getting a 
smaller share of the economic/social pie 
than European-derived populations. This 
has been going on for a long time. Claims of 
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genetic influences on intelligence are seen by 
some as tantamount to a sort of Darwinian 
justification of present economic and social 
inequalities. 

In fact, these are separate issues. The 
vast majority of current studies in behav¬ 
ioral genetics have no direct implications, 
whatsoever, for issues surrounding racial 
and ethnic differences in intelligence, sim¬ 
ply because they are studies of the varia¬ 
tion of genes and intelligence within rather 


than across racial/ethnic groups, and gener¬ 
ally within just one ethnic group, European- 
derived populations. The causes of within- 
group variation are not necessarily the same 
as the causes of between-group variation. 
Unfortunately this has not stopped people 
from speculating, on both sides of the issue. 
This issue is discussed in some detail in 
Chapter 11. 

Now let us look at some environmental 
influences on intelligence. 


CHAPTER 9 


Environmental Effects on Intelligence 


More of your conversation would infect my 
brain. 

Shakespeare, C oriolanus, act 2, 
scene 1 

Shakespeare was right (again). Every expe¬ 
rience we have leaves an imprint on our 
brains, and from it, on our minds. Clearly 
physical experiences can change the brain. 
It has been claimed that early species of 
Homo got a leg up on the evolutionary lad¬ 
der when they began to eat fish. 1 In modern 
days, lecithin , a substance found in a num¬ 
ber of foods, including fish and eggs, has 
been studied to see if it can enhance learn¬ 
ing. (The results are mixed.) We do not stop 
there; we concern ourselves with the social 
environment. If we do not believe that social 
experiences can affect the brains of children, 
why do we have decency ratings for movies 
and television programs? And why would it 
be possible to sell video programs for infants 
with names like Baby Einstein ? 2 How might 

1 Broadhurst, Cunnane, & Crawford, 1998. 

2 © The Disney Corporation. 


we manipulate the environment to improve 
intelligence? 

There are two ways this question can be 
interpreted. The less interesting interpreta¬ 
tion is “Can the environment be manipu¬ 
lated to improve test scores?” A more inter¬ 
esting question is “Can the environment 
be manipulated to improve general mental 
competence?” The answer to both questions 
is “yes.” 

Environment is a catch-all term. In dis¬ 
cussing environmental effects on intelli¬ 
gence it is useful to make a distinction 
between the physical environment and the 
social environment. The physical environ¬ 
ment involves things like nutrition, air pol¬ 
lutants, and disease - anything that makes 
itself felt by direct physical action. The social 
environment involves things like education, 
social actions that enhance or threaten secu¬ 
rity, and opportunities for self-development 
of cognitive skills. Both the social and the 
physical environment alter the brain’s activ¬ 
ity and, as Shakespeare said, infect (leave 
a physical trace on) the brain. It does not 
always pay to go to this level of analysis. 
A teacher in an elementary school does not 
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care whether a vocabulary-learning exer¬ 
cise will move brain activity related to 
word recognition from the frontal cortex to 
the posterior temporal cortex. The teacher 
wants to know if the exercise will improve 
the way the students use words. It is impor¬ 
tant to keep our discussions at the appropri¬ 
ate level of analysis - Brunswikian symmetry 
again! 

9.1. Three Key Issues in the Study of 
Environmental Effects 

In much of the literature environmental 
effects are contrasted to genetic effects. This 
is best exemplified by the continuing argu¬ 
ments over genetic versus environmental 
causes of intelligence. A more sophisticated 
way of looking at this debate is to determine 
the extent to which genetic and environ¬ 
mental constraints limit behavioral poten¬ 
tial. This is not a simple task. Three issues 
have to be kept in mind. They are the con¬ 
cepts of reaction range; the distal-proximal 
distinction, and the problem of collinear- 
ity. While these issues have been discussed 
before, they assume special importance in 
the study of environmental effects. 

9.1.1. Reaction Range 

The concept of reaction range was intro¬ 
duced in Chapter 1, section 1.5. To review 
briefly, genetics determines a potential for 
the expression of a trait, but in most situa¬ 
tions the extent to which the trait is actu¬ 
ally expressed [if at all) is influenced by the 
environment. At one extreme we have eye 
color, for which there is virtually no reac¬ 
tion range; at the other extreme are traits 
like alcoholism and Alzheimer's dementia, 
where a person inherits a risk that can be 
triggered by environmental events. Intelli¬ 
gence is decidedly the latter type of trait. 

We can think of the environment as 
determining where a given person will oper¬ 
ate, within his or her genetically specified 
reaction range. Because different people 
will operate within different environments, 
variations in behavior will be determined 


both by differences between people in reac¬ 
tion range and differences in environmental 
influences. This is the reason that h 2 varies 
in different situations; it reflects the relative 
importance of differences in reaction ranges 
and differences in environmental conditions 
that operate within the potential afforded 
by reaction ranges. 

Environmental effects are often illus¬ 
trated by experiments in which two rela¬ 
tively extreme environments are contrasted 
with each other. Such an experiment shows 
what might happen under certain extreme 
conditions. While this information can be 
important, it does not tell us what is likely 
to happen under normal conditions of envi¬ 
ronmental variation. At this point we have 
to deal with a measurement issue. In order 
to define the extent of environmental varia¬ 
tion we have to have a metric specifing how 
close two environments are to each other. 
At present no such metric exists, largely 
because we do not have a good theory of 
environmental variation. In fact, we have 
hardly any such theory at all. 

9.1.2. Proximal and Distal Causation 

As any parent of two or more children 
knows, people exert a lot of influence over 
their own environments. Where one person 
has experiences, another will gain knowl¬ 
edge. Recall Ackerman's emphasis on the 
importance of intellectual engagement upon 
the development of intelligence in adult¬ 
hood (Chapter 4). On the other hand, 
environments vary in the extent to which 
they encourage the acquisition of skills and 
knowledge. For instance, it is well estab¬ 
lished that parents in the higher socioeco¬ 
nomic status (SES) range are more likely 
to encourage their children to engage in 
exploratory and problem-solving activities 
than are parents in the lower SES ranges. 3 
Suppose that variations in the initial ten¬ 
dency to explore an environment have a 
partially genetic basis. How are we to inter¬ 
pret a study that demonstrates that children 
show greater cognitive development if they 

5 Nisbett, 2009. 
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cognition 


Figure 9.1. The design of a hypothetical experiment demonstrating 
how the development of cognition may depend on environments 
that encourage exploration. The environment acts as a proximal 
cause of the level of cognition. Unknown to the experimenter, the 
participating children are genetically divided into high and low 
explorers. Exploration tendency acts as a distal variable, producing 
variation in cognitive development within one environment but not 
within the other. 


are exposed to enriched environments that 
encourage exploration? Would this be a dis¬ 
tal genetic or proximal behavioral effect? 

In fact, many such studies have been 
conducted. Figure 9.1 shows a widely 
used design. Children (of unknown genetic 
potential for exploration) are randomly 
assigned to an environment that either 
encourages or restricts cognitive develop¬ 
ment. We suppose that, unknown to the 
experimenter, the children vary in their 
genetic tendency to explore. To take an 
extreme, assume that there are geneti¬ 
cally determined “high” and “low” explor¬ 
ers. Following the convention introduced in 
Chapter 1, the distinction is shown in ellip¬ 
soids, to designate an unobserved quantity. 
The experimenter can observe the sort of 


environment that the children are exposed 
to (center rectangles) and their cognitive 
performance (right-hand rectangles). If cog¬ 
nitive performance is better in the encourag¬ 
ing than in the restricting environment, the 
experimenter is justified in concluding that 
the environments influenced cognition, act¬ 
ing as proximal variables. However, genetic 
potential has also had an influence, as a 
distal variable. The restrictive environment 
does not permit genetic influences to act; 
the encouraging environment does; and the 
high explorers take more advantage of this 
opportunity than do the low explorers. 

Figure 9.1 presents the proximal-distal 
distinction as it might occur in an exper¬ 
iment, where the distinction is clear-cut. 
The situation can be much more confusing 
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in the natural world. The extent to which 
children are raised in an environment that 
encourages exploration varies with socio¬ 
economic status (SES). Generally, low SES 
children are raised in more restrictive envi¬ 
ronments than are the children of upper 
and middle SES families. Upper and mid¬ 
dle SES children also, on the average, have 
higher IQ scores than lower SES children. 
But it may be that the children in upper- and 
middle-class SES families are more likely to 
be genetically predisposed to explore than 
are lower SES children. It might be that 
the upper- and middle-class SES parents are 
genetically predisposed to interact with chil¬ 
dren in a way that encourages exploration. 
And there are other possibilities. It could be 
that higher SES families are subject to fewer 
social stresses, and therefore are better able 
to create an encouraging environment. Out¬ 
side of the laboratory proximal and distal 
variables are often thoroughly mixed up. In 
interpreting studies of the causes of intelli¬ 
gence this caution should be kept in mind. 

9 . 1 . 3 . Collinearity 

Collinearity refers to a situation in which 
several possible causes of a phenomenon are 
themselves correlated. Collinearity presents 
serious difficulties for anyone interested in 
determining environmental influences on 
cognition. To illustrate, there is a nega¬ 
tive correlation between family income and 
children's test scores. Nisbett has suggested 
a simple economic solution to the prob¬ 
lem: providing poor people with subsidies 
to improve their and their children's life 
situation. 4 However, he has also pointed out 
that this would not help immediately, for 
family income is correlated with a number 
of other potential restrictions on children’s 
development, including nutrition and child- 
rearing practices. Nisbett does not stress the 
point, but SES, which includes education 
and income, is also correlated with genetic 
inheritance, for people tend to marry people 
of their own general social class, especially 

4 Nisbett, 2009, p. 192. 


with respect to education. 5 Providing poor 
people with a subsidized income for child 
rearing might improve nutrition and certain 
other aspects of the environment immedi¬ 
ately, might improve child-rearing practices 
over generations (a point Nisbett stresses), 
and would probably have little effect on 
genetic makeup. 

The same thing happens on an interna¬ 
tional basis. Across nations, poor nutrition is 
associated with low test scores. Poor nutri¬ 
tion is most likely to occur in countries that 
have poor school systems, and to attack chil¬ 
dren whose parents have low IQ scores. So 
what is causing what? 

Look at the issue symbolically. Let I be 
intelligence and Q... C* be a set of pos¬ 
sible causes of intelligence. We observe a 
correlation between I and Q. But Q is 
correlated with many of the other possible 
causes, C 2 . . . C^, so the observed I, C x cor¬ 
relation might be due to any of the other 
possible causes. And to make things just a 
little harder, we have to consider possible 
feedback mechanisms. By any conceptual 
definition, intelligence is adaptive behavior. 
People with high intelligence are, in gen¬ 
eral, able to cope with unfavorable phys¬ 
ical and social environments more effec¬ 
tively than people of low intelligence. To 
take a not unrealistic example, driving while 
intoxicated puts you at risk of severe head 
injury, which cannot be a good thing for 
your intelligence. But if you are intelligent, 
you are less likely to drive while intoxicated, 
compared to an unintelligent person. 

The problem of collinearity is not unique 
to studies of intelligence. However, it is 
unusually severe in this field. For ethical 
and practical reasons, it is seldom possible 
to avoid the collinearity problem by con¬ 
ducting controlled experiments, where the 
levels of various causes are manipulated by 
an experimenter. It situations where con¬ 
trolled experiments can be conducted, there 
is often reason to question whether the lab¬ 
oratory results can be extrapolated to the 
day-to-day situation. The same criticism can 

5 Blackwell & Lichter, 2000. 
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be levied against simple regression mod¬ 
els, where one variable is “held constant” 
by statistical means. In these studies inves¬ 
tigators determine the influence of causal 
variable Q on the residual variance in intel¬ 
ligence (I) after removing variation in I due 
to all other causal variables of interest. This 
technique is a useful way of determining 
how much of the variation in the intelli¬ 
gence test scores can be associated with each 
of the predictors. We have to remember that 
the statistical model implies that each causal 
variable can be manipulated independently. 
In practice this may not be possible. Modern 
studies rely heavily on a structural equation¬ 
modeling technique similar to those used in 
quantitative behavior genetics, as described 
in the last chapter. This can go a long 
way toward disambiguating collinearities. 
Unfortunately, the resulting models become 
complex. 

Given all these analytical problems, 
understanding environmental effects on 
intelligence is going to be hard work. Will 
the results be worth the effort? Only if envi¬ 
ronmental effects are very large. The next 
section shows that they can be. 

9.2. How Much Can Environments 
Matter? The Cohort (Flynn) Effect 

In Chapter 8 we saw that the existence 
of substantial heritability coefficients, h 2 , 
showed that there are genetic influences 
on intelligence, without revealing anything 
about how the genetic influences operate. 
There is an analogous situation with regard 
to the environment. People are growing 
smarter at a rate that is far greater than can 
be accounted for by genetic/e volution ary 
effects. Senior citizens may find this hard to 
believe (unless they are talking about their 
grandchildren], but intelligence test scores 
rose throughout the twentieth century. This 
is called a cohort difference, where a cohort 
refers to the group of individuals born at a 
particular time - for example, the cohort of 
people born in 1970. Throughout most of the 
twentieth century each successive cohort 
was more intelligent than its predecessors. 


The beginnings of studies of the cohort 
effect are interesting, because it appears 
the researchers expected to find something 
quite different from what they found. 

9 . 2 . 1 . The Cohort Effect 

In the 1940s Read Tuddenham, a professor 
at the University of California, compared 
intelligence test scores for White men who 
had enlisted in the US Army during World 
Wars I and II. 6 Tuddenham wanted to test 
R. B. Cattell's conjecture that intelligence 
should drop over time, because people with 
high intelligence test scores have fewer chil¬ 
dren than people with low test scores. 7 
Tuddenham found just the opposite. During 
the twenty-five year period between World 
War I (American participation 1917-18] and 
World War II (American participation 1941— 
45], the mean intelligence test score of young 
White males in the United States increased 
by approximately one standard deviation 
unit. See panel 9.1 for more details of Tud¬ 
denham's work. 

In the 1950s K. Warner Schaie, who had 
worked with Tuddenham as an undergrad¬ 
uate, began graduate studies at the Uni¬ 
versity of Washington. His thesis advisor 
was Charles R. Strother, a well-known clin¬ 
ical psychologist interested in aging. At the 
time a puzzling anomaly had been noticed. 
Studies of aging used one of two designs. 
In a cross-sectional design people of differ¬ 
ent ages are tested at the same time. Thus 
a study in, say, 1955 might involve people 
who were in their twenties, thirties, for¬ 
ties, and so on. In a longitudinal study a 
group of people of more or less the same 
age is followed for some period of time. For 
instance, an investigator might begin a study 
in 1955, with participants in their twenties, 
and follow them as they aged over the next 
twenty to thirty years. Cross-sectional stud¬ 
ies led to the conclusion that cognitive func¬ 
tions begin to decline after a peak in the 
late twenties, with some differences in tim¬ 
ing for different functions. Studies using 

6 Tuddenham, 1948. 

7 Cattell, 1940. 
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Panel 9.1. The Tuddenham Study 

In 1917 the United States Army devel¬ 
oped the Army Alpha Test as a device 
for screening recruits. In World War 
II (American participation 1941-45] the 
Army used a successor test, the Army 
General Classification Test. In order to 
compare the two tests the Army gave a 
version of the World War I test to 768 
World War II soldiers, selected to repre¬ 
sent the demographics of World War II 
enlistees. The median score of the World 
War II soldiers was approximately one 
standard deviation higher (using World 
War I norms) than the median score for 
the World War I soldiers. 


Tuddenham pointed to several possi¬ 
ble causes of the discrepancy. He felt that 
the most important of these was that the 
1945 soldiers had, on the average, much 
more education than the 1917-18 soldiers. 
He reinforced this conclusion by showing 
that those World War I soldiers who were 
literate had test scores reasonably close 
to the scores of the World War II sol¬ 
diers. Figure 9.2 shows both the general 
effect and the effect of introducing liter¬ 
acy as a covariate. In many ways Tudden- 
ham’s study anticipated discussions ol the 
cohort effect that were to emerge over 
the next fifty years. 


longitudinal designs found that declines did 
not begin until people were in their fifties or 
sixties. 

Schaie and Strother realized that both 
cross-sectional and longitudinal studies are 
confounded with cohort effects, but in dif¬ 
ferent ways. 8 A cross-sectional study com¬ 
pletely confounds age with cohort; if people 
are tested in, say, 2000, then all the thirty- 
year-olds will have been born in 1970, the 
forty-year-olds in i960, and so forth. A lon¬ 
gitudinal study is conducted within a single 
cohort; if people are followed from twenty 
to fifty, beginning in 1980, all participants 
will have been born in i960. Schaie and 
Strother offered an elegant solution to the 
design issue, the cohort-sequential design. 

In a cohort-sequential design a cross- 
sectional sample of people of varying ages 
(and birth cohorts) is collected at time 1. 
Then, after a period of time (in Schaie 
and Strother's case, seven years) a second 
cross-sectional design is conducted, drawing 
a sample from the same population, at time 
2. In addition, the first sample is contacted 
again, and as many people as possible are 
retested in a second wave. The procedure 
is repeated over as many testing waves as 

8 Schaie & Strother, 1968. 


possible. Cohort and age studies can now be 
analyzed separately. 

Suppose that two test waves have been 
conducted, in 1956 and 1963 (the actual 
test dates of Schaie and Strother’s first two 
waves). Consider the following four partici¬ 
pants: 

Person A bom 1936 Tested in 1956 at age 
twenty and in 1963 at 
age twenty-seven 

Person B bom 1916 Tested in 1956 at age 
forty and in 1963 
at age forty-seven 

Person C bom 1943 Tested . 

in 1963 at age twenty 

Person D bom 1923 Tested . 

in 1963 at age forty 

There are two longitudinal contrasts: 
between person A in 1956 and 1963, covering 
the age span twenty to twenty-seven, and 
between person B in 1956 and 1963, cover¬ 
ing the age span forty to forty-seven. There 
are three cross sectional contrasts. They are 
between person A and B, tested in 1956, at 
ages twenty and forty; between persons A 
and B, tested in 1965, at ages twenty-seven 
and forty-seven, and between persons C and 
D, tested in 1963, at ages twenty and forty. 
There are two cohort contrasts: between per¬ 
son A, tested in 1956, and person C, tested in 
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■ All WW I 

■ Literate WW I 

■ WW 2 


Figure 9.2. Scores on the Army Alpha Test obtained by World War 
I soldiers, World War II soldiers, and literate World War I soldiers. 
Scores are shown by increasing deciles - that is, the scores on the 
far left are for the tenth decile of each group, and so forth. Data 
from Tuddenham, 1948. 


1963, both at age twenty at the time of test¬ 
ing, and between person B, tested in 1956, 
and person D, tested in 1963, both at age 
forty at time of testing. 

The cross-sequential design makes it 
possible to determine longitudinal, cross- 
sectional, and cohort effects. It does con¬ 
found date of testing with other effects, but 
I have never heard of any reason that date of 
testing, alone, should have an influence on 
test scores. 

In addition to its design aspects, Schaie 
and Strother’s study was noteworthy for 
its sophisticated measurements. Instead of 
relying on a single test score, Schaie and 
Strother used numerous tests to evaluate 
different aspects of intelligence. For our pur¬ 
poses the most interesting are their measures 
of inductive reasoning, verbal comprehen¬ 
sion, and spatial reasoning. These measures 
correspond well to the modern g-VPR the¬ 
ory: inductive reasoning maps onto g; ver¬ 
bal comprehension maps onto the verbal 
dimension; and the spatial reasoning task 
they used appears to me to be a mix of 
the perceptual and rotational dimensions in 
Johnson and Bouchard’s VPR model. 9 

9 Johnson & Bouchard, 2005. 


The 1968 article reported large cohort 
differences in all three types of reason¬ 
ing. They were invariably in favor of later- 
born cohorts, which is consistent with 
Tuddenham’s results. Schaie and Strother 
concluded that the rapid drop in intelligence 
after early middle age (thirty to forty) that 
had been observed in cross-sectional stud¬ 
ies was very largely a cohort rather than an 
aging effect. 

Schaie has continued and expanded the 
study, using the cross-sequential design, col¬ 
lecting data every seven years for over forty 
years! The study, which is now known as 
the Seattle Longitudinal Study, is described 
in panel 9.2. 

Figure 9.3 shows the effect of cohort 
changes since the birth cohort of 1903 (who 
were fifty-three at the time of the first 
testing) until the last testing in 1998. The 
results are typical of what is found in this 
field. Compared to the birth cohort of 
1903, all subsequent birth cohorts showed 
a steady increase in reasoning. The increase 
in spatial orientation ability was somewhat 
more uneven, but still marked. The abil¬ 
ity to extract verbal meaning increased until 
the birth cohort of 1952, and subsequently 
decreased. 
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Panel 9.2. The Seattle Longitudinal 
Study 

The Seatde Longitudinal Study grew out 
of a fortuitous combination of interests, 
combined with K. Warner Schaie’s tena¬ 
cious pursuit of a good idea. Follow¬ 
ing his undergraduate work with Read 
Tuddenham at the University of Califor¬ 
nia, Berkeley, Schaie moved to the Uni¬ 
versity of Washington to begin studies 
for his Ph.D. with Charles R. Strother, 
a senior professor in clinical psychology 
who was very interested in public health 
issues. 

Strother had helped found the Seattle 
Group Health Program, one of the first 
large, open-access health maintenance 
organizations fHMOs) in the United 
States. (Previous HMOs had been asso¬ 
ciated with certain industries or institu¬ 
tions.) He and Schaie realized that the 
people enrolled in the program consti¬ 


tuted a group that could be approached 
over time. With the cooperation of the 
Group Health organization, Schaie has 
managed to collect data over seven con¬ 
secutive waves, extending until the latest 
wave in 2005. This has been an impressive 
feat of scientific management and persis¬ 
tence. 

The project has measured intelligence 
using tests based on Thurstone’s Primary 
Mental Abilities model, which includes 
separate factors for verbal reasoning, 
nonverbal reasoning, induction, and 
numerical/arithmetic reasoning. Data has 
also been gathered on personality fac¬ 
tors, health, and lifestyle. Related stud¬ 
ies have been conducted on interventions 
designed to ameliorate cognitive deterio¬ 
ration in old age. The results have been 
reported in two books* and numerous 
research papers. 

* Schaie, 1996, 2005. 


To make these numbers a bit less abstract, 
here is a hypothetical example. Since the 
measures of reasoning are closest to what 
would be considered measures of g in mod¬ 
ern theories, the comparisons will be made 
for the reasoning test, converted from stan¬ 
dard deviation units to IQ points. 


Consider two people of age thirty, born 
in 1903 and tested in 1933. We select them so 
that one is at the median level of intelligence 
in his or her 1903 birth cohort and the other is 
at the eighty-fifth percentile. By definition, 
the first person will have an IQ of 100 and 
the second an IQ of 115, by 1933 standards. 



Birth year of cohort, starting in 1903 (03) 


—•#— Reasoning 
— ■ - Spatial orientation 
•••if Verbal meaning 


Figure 9.3. The increase in three dimensions of intelligence over 
cohorts, using the 1903 birth cohort as a base. Data from Schaie, 
2005, p. 135; converted to standard scores. 
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We now take two more people, age thirty, 
but born in 1973, who are tested in 2003, and 
who are at the fiftieth and eighty-fifth per¬ 
centile by 2003 standards. They would have, 
respectively, IQs of 115 and 130 by 1933 stan¬ 
dards! People are getting smarter. 

9 . 2 . 2 . The Cohort Effect Goes 
Internationall The Flynn Effect 

Early in the 1980s Jim Flynn, a political 
philosopher at the University of Otago, in 
New Zealand, became interested in changes 
in the difficulty of standardized tests of intel¬ 
ligence, such as the Wechsler tests and the 
Stanford-Binet tests. Flynn located seventy- 
three different studies, covering eighteen 
pairings in which the same people had been 
given both a current test and a test stan¬ 
dardized at some earlier time (e.g., the orig¬ 
inal Wechsler test, standardized in 1935— 
37, and the second version, standardized in 

1953- 54). h 1 seventeen of the eighteen com¬ 
parisons people scored higher on the ear¬ 
lier version of the test than on the later 
version. 10 For instance, in the comparisons of 
the original and second Wechsler tests there 
was a difference of 4.69 IQ points, approx¬ 
imately one-third of a standard deviation 
unit, over a period of seventeen years. Flynn 
concluded that whatever trait the Wechsler 
tests measure had increased steadily in the 
US population. 11 

Flynn then examined test scores in stan¬ 
dardization studies conducted in fourteen 
different industrialized nations. 12 All of 
them showed cohort effects, ranging from a 
high of twenty-five IQ points (France, 1949- 
74) to a low of six points (United States, 

1 95 4 - 7 8). 

In subsequent studies Flynn found that 
the increase in scores over time was 

10 The one negative finding involved comparisons of 
tests intended for populations of different ages. 

11 Flynn, 1984. 

12 Flynn, 1987. The nations involved were Austria, 
Australia, Belgium, Canada, France, German 
Democratic Republic (now part of Germany), 
German Federal Republic (now part of Ger¬ 
many), Great Britain, Japan, the Netherlands, New 
Zealand, Norway, Switzerland, and the United 
States of America. 


markedly larger for nonverbal, culture- 
reduced tests than it was for verbal tests. 
This is consistent with results from the 
Seattle longitudinal study, where the cohort 
effects on a nonverbal reasoning test were 
greater than those on a verbal compre¬ 
hension test (Figure 9.3). Flynn also con¬ 
trasted the increase in US intelligence test 
scores with decreases in SAT scores over 
the period since World War II. 13 He con¬ 
cluded that, at a descriptive level, there had 
been substantial gains in IQ scores, intel¬ 
ligence in the narrow sense, over the first 
three quarters of the twentieth century, and 
that they were concentrated in abstract rea¬ 
soning tests. He also expressed considerable 
skepticism about whether intelligence in the 
conceptual sense had similarly increased, for 
he did not believe that people today are 
markedly smarter than their ancestors. 

9 . 2 . 3 . Further Documentation 
of Flynn's Observations 

Flynn’s research caught the public eye. Her- 
rnstein and Murray coined the term “Flynn 
effect” in 1994. 14 In 1996 the American Psy¬ 
chological Association sponsored a sympo¬ 
sium on possible causes of the effect. 15 Flynn 
himself published a book in which he con¬ 
sidered further studies and their implica¬ 
tions in 2007. 16 Many articles by other writers 
have commented on the effect. Surprisingly, 
only minimal references have been made to 
Schaie's work, even though it was very well 
known to gerontologists. 

Research on the Flynn version of the 
cohort effect has been greatly helped by a 
social phenomenon. Although only a few 

13 The drop in scores was dramatic, especially over the 
period 1970-92. However, the contrast over cohorts 
is problematical. The SAT is taken by a self-selected 
population, people intending to enter college. Dur¬ 
ing the 1950-2000 period there was a considerable 
increase in the percentage of high school students 
who took the test. Therefore, it is hard to draw con¬ 
clusions about changes in the cognitive competence 
of the US population, in general, from changes in 
SAT scores over time. 

14 Hermstein & Murray, 1994. 

15 Neisser,i998. 

16 Flynn, 2007. 
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developed countries currently have compul¬ 
sory military service, several European coun¬ 
tries require that young men register for 
possible conscription in case of national 
emergency. When they enroll the registrants 
are given medical and psychological exami¬ 
nations, including a test of cognitive skills. 17 
Raven’s Standard Progressive Matrices or 
similar tests are used by some countries. As 
a result, it is possible to compare scores of 
virtually the entire population of eighteen- 
year-old men, on the same test, over the 
years. 

Figure 9.4 shows the test scores for Dan¬ 
ish registrants, as a function of time of 
testing. 18 This figure, which is typical of 
data obtained in other studies, shows three 
important features of the Flynn effect. The 
first is that the effect is certainly there; 
test scores increased markedly in the thirty- 
year period from 1958 until 1988. The sec¬ 
ond is that the effect is greatest at the low¬ 
est level of scores. The two top levels, the 
ninetieth and seventy-fifth percentiles, show 
very little effect. 19 The third is that there is 
some indication that the effect is slowing 
over time. Studies using data obtained since 
1990 show considerably smaller increases, 
and some have even suggested decreases in 
the 1980s and 1990s birth cohorts. 20 

9.2.4. What Does the Cohort Effect 

Tell Us? 

Does the cohort effect exist? Objections can 
be raised to the design of many of the studies 
of the effect, individually. Some of the prob¬ 
lems are discussed in panel 9.3. These prob¬ 
lems raise potential objections, rather than 
disproving the findings. Also, the nature of 

17 The United States currently requires registration 
but does not do any testing. 

18 Teasdale & Owen, 2000. 

19 It has been claimed that the lack of effect at the 
top is due to an artifact in scoring called the “ceiling 
effect.” If a substantial number of people score at the 
top, their scores cannot increase (J. Raven, 2000). 
This argument applies only to the highest percentile 
considered, here the ninetieth percentile. As the 
figure shows, there is clearly a trend to smaller and 
smaller cohort effects as the percentile increases. 

20 See Teasdale & Owen, 2008, for data and further 
references. 


the objections varies from study to study. 
This does not seem to matter; the results are 
surprisingly robust. Test scores rose substan¬ 
tially from the 1920s until the late 1980s, and 
then either leveled off or, possibly, dropped. 
The facts are clear. What to make of them 
is not. 

Are people really getting smarter? If the 
correlations between intelligence test scores 
and measures of socially important cogni¬ 
tive achievements, such as school grades 
and income, were close to one, the answer 
would have to be “yes." If the correlations 
were close to zero, changes in test scores 
would be irrelevant to discussions of cog¬ 
nition in everyday life. In fact, the correla¬ 
tions are in the .4~.6 range; certainly not one, 
but high enough to show that test scores 
have to be taken seriously. (More details 
are given in Chapter 10.) As was pointed 
out in Chapter 1, what correlations in this 
range show is that intelligence tests evalu¬ 
ate “life-relevant” cognitive skills and some 
skills that are unique to the testing situa¬ 
tion. Is the cohort effect due to changes 
in the life-relevant skills or the test-unique 
skills? 

The test sophistication argument asserts 
that the cohort effect is due to an increase 
in test-unique skills, because over the years 
cognitive testing has become more a part of 
everyday experience. I do not think that this 
argument can be maintained. If test scores 
increasingly reflected test-taking skills, then 
the correlation between test scores and other 
measures of cognitive achievement should 
have fallen over time. There is no evi¬ 
dence that this is the case. Also, American 
test scores continued to rise in the 1960-80 
period, which is well after the widespread 
introduction of testing in the United States. 
The rise in test scores almost certainly rep¬ 
resents a rise in cognitive skills across the 
developed world. 

The cohort effect varies with the type 
of test. By far the greatest rise is found 
in nonverbal, g-loaded tests, such as pro¬ 
gressive matrix tests. Flynn 21 contrasts this 
rise to the stability in performance on 

21 


Flynn, 1987; 2007, Chapter 2. 


ENVIRONMENTAL EFFECTS ON INTELLIGENCE 


267 


Secular Trends 



Test Date 1958 1968 1978 1988 1998 

Figure 9.4. Progressive matrix test scores for Danish 
eighteen-year-old men registering for military enlistment, 

1958-98. From Teasdale 81 Owen, 2000, with permission from 
Elsevier. 


assessments of explicitly taught school sub¬ 
jects. IQ test scores rose from 1971 to 2002, 
especially on tests requiring abstract reason¬ 
ing. The twelfth-grade achievement scores 
for the National Assessments of Educa¬ 
tional Progress (NAEP) tests, which focus 
on school subjects, did not rise. Flynn con¬ 
cluded that there has been an increase in 
abstract reasoning skills without a concomi¬ 
tant increase in academic knowledge. In 
terms of the three-stratum model, Gf is up 
while Gc is flat. Why? 

As Flynn and others have pointed out, any 
discussion of the cohort effect has to deal 
with a paradox. Flynn estimated that there 
has been a rise of .3 IQ point per year. If 
we project this backward, to calculate the 


intelligence of past cohorts, we find that 
the mean IQ score for young male adults 
in 1942 was equivalent to 80 by 2008 stan¬ 
dards. If we accept this finding at face value, 
it leads to the conclusion that almost half 
of the soldiers who fought in World War II, 
the group that have been called “The Great¬ 
est Generation,” 22 would not meet the men¬ 
tal requirements for enlistment into today’s 
army. This conclusion is ridiculous. But 
what has happened? 

To answer that question we have to ask 
what might cause the cohort effect in the 
first place. The cohort effect cannot be due 
to genotypical changes. There is a negative 

22 Brokaw, 1998. 
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Panel 9.3. Design Issues in the 
Study of Cohort Effects 

Studies of the cohort effect have taken 
two forms. The first, which is typified 
by the Tuddenham study, some aspects 
of the Seattle Longitudinal Study, the 
normalization comparisons for Raven’s 
Standard Progressive Matrices, and the 
various European studies of enlistees, 
compares the scores obtained on the same 
test by people of the same age, but from 
different cohorts - for example, regis¬ 
trants for the military in Denmark in 1958, 
and then again in 1978. A generalization is 
then drawn to some larger population. 1 
will call this the same-test, different-cohorts 
paradigm. Drawing conclusions from this 
design requires two assumptions. 

The first is that the test is a meaning¬ 
ful way to evaluate cognition in different 
cohorts. This seems a reasonable assump¬ 
tion for “culture-reduced” tests, such as 
progressive matrix tests, given to the 
same overall population - for instance, 
the population of the United States in 
1933 compared to the population in 1983. 
It is also reasonable if the cohorts are 
not far apart in years, for cultures do not 
change that quickly. 

There are situations where this as¬ 
sumption would not be reasonable. For 
instance, if the tests were to be con¬ 
ducted in a developing country it might 
be the case that proportionately more 
people in the more recent cohort would 
be accustomed to the testing paradigm, 
due to dramatic increases in schooling, 
and with it, testing. Such an argument 
is much less tenable for a comparison 
of cohorts in an industrially developed 
country. 

Tests of verbal behavior and general 
knowledge have to be modified for the 
cohort involved. To take an extreme, 
albeit slightly frivolous, example, in 3938 
the term gay bachelor referred to an 
unmarried man who enjoys the company 
of women. By 2008 the term had come 


to acquire a rather different meaning. 
Any test involving cultural knowledge 
also faces the danger of being frozen in 
time or restricted to a particular cultural 
group. In order to maintain widespread 
applicability, commercial tests that eval¬ 
uate crystallized knowledge (Gc) tend 
to evaluate “least common denomina¬ 
tors,” knowledge that is held widely 
through the society. Common knowledge 
changes over cohorts. The tests have to be 
changed accordingly. 

The second assumption is that the 
cohorts are similar samples of a larger 
population. Findings on the cohort effect 
have been unhesitatingly generalized to 
entire populations. In fact, few studies 
utilize a random sample of the popula¬ 
tion for which generalization is intended. 
For instance, the 1992 United States stan¬ 
dardization sample for the widely used 
Raven’s Standard Progressive Matrices 
test was entirely drawn from Des Moines, 
Iowa. To what extent, then, can we gen¬ 
eralize to the population of the United 
States?* Tuddenham compared US mili¬ 
tary recruits from World War I to recruits 
from World War II. Recruitment proce¬ 
dures were not the same in the two wars, 
so is it valid to make an inference about 
changes in population IQ? The Seattle 
Longitudinal Study drew from a popula¬ 
tion of enrollees in a health care program. 
To what extent is this population repre¬ 
sentative of the US population in general? 
Also, to what extent did the nature of 
the enrollees in the health care program 
change over the almost half-century life¬ 
time of the project? 

Because of questions like these the 
European military registration studies 
are particularly valuable. They involve 
repeated sampling of the same subpop¬ 
ulation, young men eligible for military 
service, over fairly brief time intervals. 
The fact that a cohort effect appears in 
several European studies of this group is 
an important confirmation of other same- 
test, different-cohort designs. 
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Flynn relied on a second paradigm, in 
which two tests, with norms established 
at different times, were given to the same 
sample. Call this the two-test, one-cohort 
design, and refer to the test with the ear¬ 
lier norms as Test 1 and the second test as 
Test 2. Suppose that scores are higher, in 
terms of percentiles defined by the origi¬ 
nal standardization, on Test 1 than on Test 
2. This implies that the standardization 
sample for Test 1 had lower abilities than 
the standardization sample for Test 2. 

The argument depends on the two 
standardization samples being equally 
representative of the general population 
at the time that the standardization is 
done. Considerable care is taken in stan¬ 
dardizing tests such as the Stanford-Binet 
and Wechsler tests, as these tests are 
widely used in clinical practice, and to 


establish legal competency and/or qual¬ 
ification for special education programs. 
The cohort effect can be illustrated with 
these tests. 

* Raven, 2000. Des Moines was chosen because, 
on some statistical criteria, such as age distri¬ 
bution and distribution of ethnic groups, the 
city was claimed to have matched the United 
States as a whole. The same logic was used 
for the British standardization, where the sam¬ 
ple was drawn from Dumfries, Scotland. The 
problem with this approach is that the match¬ 
ing is solely on those variables that the inves¬ 
tigator thinks are appropriate, and leaves any 
other measures free to vary. For instance, in 
the United States educational standards vary 
widely across states and even across school dis¬ 
tricts within states. Was the quality of educa¬ 
tion in Des Moines equivalent to the typical 
quality of education in the United States at the 
time? Random sampling avoids such problems 
by equating statistical expectations for all covari¬ 
ates, not just for those felt to be important by the 
investigator. 


correlation between IQ scores and fertil¬ 
ity [number of live births per woman). 23 
Because of this, some investigators have 
argued that on a population basis the genetic 
potential for intelligence has dropped, even 
though IQ scores have risen. 24 

While this argument has a certain amount 
of merit, it is not completely compelling. 
Differential fertility, alone, does not guar¬ 
antee a "dysgenic effect,” lowering the mean 
intelligence of the population by lower¬ 
ing the genetic potential. This result would 
depend upon a number of other things, 
including the initial relative frequencies of 
various genetic potentials and the mating 
practices in the society. 25 One could create a 
situation in which the genetic potential of a 
population increased, even though there was 
a negative correlation between intelligence 
and fertility. However, this can happen only 
if the society follows mating practices that 
are not found in any of the industrialized 
nations where the cohort effect has been 

23 Herrnstein & Murray, 1994. 

24 Herrnstein & Murray, 1994, pp. 348-352; Lynn, 1998. 

2 5 Loehlin, 1998; Preston, 1998. 


observed. 26 The cohort effect is environ¬ 
mental. 

There is an analogy between the cohort 
effect and the finding that the heritability 
coefficient is .50 or higher (Chapter 8). The 
heritability finding shows that genetics is 
important, but it does not identify the rele¬ 
vant genes. The cohort effect does the same 
thing for environmental influences. It shows 
that environmental influences are impor¬ 
tant, but does not identify the influences. 
If we look at any time-linked phenomenon 
extending over a period of seventy to one 
hundred years, we find that there have been 

26 This situation could arise if males with high genetic 
potential had more offspring than males with lower 
genetic potential. The discrepancy would have to be 
large enough to overcome the negative correlation 
between intelligence and female fertility. This could 
happen in a polygamous society, if the males with 
high genetic potential had more access to women 
than males with low genetic potential. There may 
be one spectacular case. Genetic analyses indicate 
that 8% of the men living in Central Asia (approxi¬ 
mately .5% of the world's men) are descended from 
one individual who lived approximately a thousand 
years ago. If we combine this observation with his¬ 
torical records, suspicion falls on Genghis Khan, the 
founder of the Mongol Empire (Zerjal et al., 2003). 
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so many, highly collinear social changes that 
it is virtually impossible to single out any 
one of them and say “That’s it.” 

The analogy goes further. Behavioral 
geneticists and biomedical scientists have 
found some genes that are very bad for intel¬ 
ligence, but have not been able to isolate the 
genes that produce high intelligence. We can 
identify environments that are very bad for 
intelligence, but have not been able to iden¬ 
tify environments that are very good for it. 

9.3. The Physical Environment 

Several features of the physical environment 
are associated with low intelligence test 
scores. These include poor health practices, 
frequent or severe injuries, substance abuse, 
poor nutrition (especially in infancy), and 
exposure to atmospheric pollutants. In some 
cases the direction of causality is clear-cut. 
Any physical agent that damages the struc¬ 
ture of the brain or interferes with brain pro¬ 
cessing may reduce intelligence, on either 
a temporary or a permanent basis. That is 
simply common sense. In other cases there 
is a statistically reliable association between 
intelligence and some potentially harmful 
agent, but causality is difficult to determine. 

One of the reasons for this is a sort 
of chicken-and-egg problem: which came 
first, reduced intelligence or exposure to the 
agent? Linda Gottfredson, who has written 
widely on the social implications of intelli¬ 
gence, has pointed out that the task of fol¬ 
lowing health and safety regimens is itself 
cognitively challenging. 27 To take a specific 
case, intelligence test scores are predictors 
of involvement in motor vehicle accidents, 
and such accidents are one of the common¬ 
est causes of head injury. 25 Granted that 
individuals who have suffered head injuries 
behave unintelligently at times, how likely 
were they to behave unintelligently prior to 
the accident? 

Collinearity makes it hard to determine 
causality. Exposure to dangerous conditions 

27 Gottfredson, 2007b. 

28 O’Toole, 1990; Smith & Kirkham, 1982. 


and low intelligence test scores are corre¬ 
lated with many other variables, such as 
socioeconomic status (SES), low parental 
intelligence, and inadequate access to good 
schools, that could themselves adversely 
affect intelligence. Picking out just one of 
these factors as the cause of low intelligence 
becomes hard to justify. 

In order to assert that a particular incident 
or environmental condition has affected 
intelligence it is necessary to show (a) that 
low intelligence did not contribute to the 
incident or condition, and (b) that other 
possibly relevant conditions have been mea¬ 
sured or controlled experimentally. In the 
ideal case pre-morbid measures of intelli¬ 
gence, taken before the incident or expo¬ 
sure, are compared to measures taken after 
the fact. 

9.3.1. Direct Insults to the Brain 

Chapter 7 discussed the relationship bet¬ 
ween brain structures and intelligence. 
There it was pointed out that damage to 
certain areas of the brain will result in loss 
of intelligence and, if the injury is suffi¬ 
ciently discrete, may disrupt certain intel¬ 
lectual functions and not others. Damage to 
the forebrain-parietal system will disrupt the 
working memory-attentional control com¬ 
plex, thus damaging the functions underly¬ 
ing general reasoning (Gf or g ). Damage in 
the medial temporal region, and especially 
to the hippocampus, will disrupt the ability 
to form new memories (anterograde amne¬ 
sia). The disability can be severe enough 
that the affected person has to be in custo¬ 
dial care for life. Paradoxically, some writers 
have claimed that this does not affect intel¬ 
ligence, on the ground that IQ test scores 
many not be lowered. Because any reason¬ 
able definition of intelligence has to include 
the ability to learn from experience, the fact 
that profound anterograde amnesia does not 
influence IQ test scores simply shows that 
test scores are only partial indices of intelli¬ 
gence. 

Less dramatic, but important, losses of 
cognitive functions can occur when the 
brain is subjected to apparently minor 
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physical damage. Closed head injury, con¬ 
cussion , is of special interest because of its 
prevalence; there are more than a million 
cases annually in the United States. 29 Closed 
head injuries are often followed by a period 
of disorientation that is obvious both to the 
individual and to others. In most cases this 
subsides. When conventional intelligence 
tests are given, a year or more later, effects 
are often not found. When effects are found 
they are usually on abstract, nonverbal rea¬ 
soning. Tests that emphasize verbal cogni¬ 
tion (or Gc, if you wish to use the Gf-Gc 
model) appear to be much less sensitive to 
the aftermath of severe concussions. So, if 
we were to restrict ourselves to the intel¬ 
ligence test data alone, we might conclude 
that “a knock on the head” is not all that 
serious. A closer look shows that that is not 
the case. 

When patients are tested using laboratory 
tasks evaluating working memory functions, 
effects will be found. However, the labora¬ 
tory tests are considerably more searching 
than the memory evaluations included in 
an intelligence test. The people who have 
been injured often do not report problems 
in everyday life, even though they have trou¬ 
ble with the laboratory tests. We might then 
conclude that the residual effects of concus¬ 
sion are not serious enough to be of concern. 
But that is not the case. 

Investigators at the United Kingdom’s 
Applied Psychology Unit (Cambridge) went 
a step further. They asked the living part¬ 
ners of people who had suffered concus¬ 
sions whether or not there had been any 
long-term effects. The partners replied that 
there had been, and furthermore, the sever¬ 
ity of the deficit, as reported by the partner 
but not the affected individual, was corre¬ 
lated with the difficulty the affected person 
had in dealing with the laboratory tasks. 30 
In addition to being interesting in itself, this 
study showed the importance of obtaining 
information about a person’s everyday per¬ 
formance, as well as observing performance 

29 Information downloaded from www.healthline. 

com/adamcontent/concussion, June 2009. 

30 Sunderland, Harris, & Baddeley, 1983. 


during an out-of-context situation, such as 
a laboratory study or a conventional testing 
session. 

A second study shows how concussion 
can act as a distal influence that increases 
the risk of incurring a condition that acts as a 
proximal influence on intelligence. The peo¬ 
ple studied were elderly (sixty-plus) pairs of 
twins, where one twin suffered from Parkin¬ 
son’s disease, which results in a loss of intel¬ 
ligence in its latter stages, and the other did 
not. The risk of incurring Parkinson’s dis¬ 
ease tripled for those who had experienced 
head injuries, even though the head injuries 
had occurred, on average, thirty-seven years 
before the study was conducted. 31 

As these two studies illustrate, long-term 
damage to intelligence can result from what 
are, at the time, apparently recoverable 
injuries to the brain. Stories of punch-drunk 
boxers are not just legend. 32 

9 . 3 . 2 . Prenatal and Infant Health Issues 

A great deal of attention has been devoted 
to prenatal and infant development, because 
this is when the foundations of cognition are 
laid down. In their first year infants acquire 
a great deal of information about their lan¬ 
guage and about social interactions. 33 To 
what extent do individual differences in 
cognitive function in infancy predict later 
indices of intelligence? Infancy is also a 
period of high physiological vulnerability. 
To what extent are these early indicators 
sensitive to environmental variables? 

Infant development is often measured by 
the Bayley scales of development, devel¬ 
oped originally by Nancy Bayley and her 
colleagues at the University of California, 
Berkeley, and updated periodically. 34 These 
scales document the age of normal occur¬ 
rence of a variety of behaviors, such as 

31 Goldman et al., 2006. 

32 Nor do we need to rely on legend. Survey results 
have shown that the incidence of memory prob¬ 
lems is markedly elevated among retired profes¬ 
sional football players {New York Times , Sept. 30, 
2009, p. Ai]. 

33 Gopnick, Meltzoff, & Kuhl, 1999. 

34 Bayley, 2005. 
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crawling, toddling, vocalization, and reac¬ 
tivity. Contemporary middle- and upper- 
class parents are notoriously concerned that 
their children be on schedule. This con¬ 
cern may be a bit overdone. The correlation 
between scores on developmental scales that 
are based on activity and vocalization over 
the first thirty-six months of life, and scores 
on adult intelligence tests is nearly zero. 35 
However, this does not mean that the devel¬ 
opment of information-processing and ver¬ 
bal measures can be disregarded. By the age 
of three, tests that involve vocabulary cor¬ 
relate in the q-q range with IQ scores for 
young adults. 

One of the most important things an 
infant has to do is to recognize stability 
and change in the environment. The infant’s 
ability to do this can be evaluated by mea¬ 
suring habituation, the ability to discrimi¬ 
nate between novel and repeated stimuli. 36 
Let us take a closer look at the procedure 
used. 

Joseph Fagan, a professor at Case Univer¬ 
sity, and his colleagues have measured habit¬ 
uation by showing infants an "interesting” 
picture, often a picture of another infant. 
After the infant has viewed one picture it 
is removed, and then presented again, along 
with a new picture. The measure of habitu¬ 
ation is the extent to which the infant looks 
at the new picture. This simple test for six- 
to twelve-month-old infants has a correla¬ 
tion of .32 (corrected for reliability, .59) with 
measures of adult intelligence at twenty-one 
years of age. 37 

Birth weight, one of the most commonly 
used indices of prenatal health, is corre¬ 
lated with intelligence, for both premature 
and normal term infants. One study esti¬ 
mated that there is an increase of between 
.3 to .5 IQ points for every 100 grams of 
birth weight over 2,500 grams (about 5.5 
pounds). 38 This finding, standing alone, is 
ambiguous, because maternal IQ is posi- 

35 Bayley, 1968. 

36 Brody, 1992, p. 232; Fagan, Holland, & Wheeler, 

2007. 

37 Fagan, Holland, & Wheeler, 2007. See also the ref¬ 
erences therein for citations to earlier studies. 

38 Matte et al., 2001. 


tively correlated with the birth weight of 
the child. Collinearity again! When maternal 
IQ is controlled statistically, the relationship 
between IQ and birth weight drops but does 
not disappear. 39 The finding is not confined 
to cases of abnormally low birth weight. 
Differences in intelligence between heav¬ 
ier and lighter babies in the normal range 
(2,000 grams and above) have been reported 
in studies in a variety of North American 
and European countries, and in both white 
and African American populations in the 
United States. 40 

The relationship does not solely reflect 
any tendency for bright mothers to have 
children that are both heavier and have a 
higher genetic potential for intelligence, as 
separate effects. This was shown in an ele¬ 
gant study of two large samples, one in New 
Zealand and one in the United Kingdom. 41 
Because of the size of the samples it was pos¬ 
sible to compare birth weights and IQ scores 
in monozygotic (MZ) twins, thus controlling 
for genetic influences. The heavier twins had 
higher IQ scores, demonstrating the impor¬ 
tance of the prenatal environment indepen¬ 
dent of genotype. 

We can take the comparison to an 
extreme by looking at “preemies,” new¬ 
borns with a gestational age of less than 
thirty-two weeks, many of whom have birth 
weights well below 2,000 grams. These chil¬ 
dren, on the average, have IQ scores almost 
one standard deviation unit below a con¬ 
trol group. 42 In general, nonverbal function¬ 
ing and abstract reasoning are more affected 
than verbal functioning. Considering the 
links between information processing and 
g, it is not surprising to find that prema¬ 
ture infants do poorly in childhood (eight 
to twelve years) on tests of working mem¬ 
ory and attentional control. 43 

Because birth weight predicts future 
intelligence, it is of interest to know what 
variables may lead to low birth weight. By 
far the largest risk factor is premature birth, 

39 Deary, Der, & Shenkin, 2005. 

40 Dombrowski, Noonan, & Martin, 2007. 

41 Newcombe et al., 2007. 

42 Esbjorn et al., 2006. 

43 Bayless & Stevenson, 2007. 


ENVIRONMENTAL EFFECTS ON INTELLIGENCE 


2 73 


Panel 9.4. Fetal Alcohol Syndrome 

Fetal alcohol syndrome (FAS) is a defect 
in intelligence occuring in children whose 
mothers drank heavily (more than four 
drinks per occasion) during pregnancy. 
Prevalence in the US is somewhat less 
than one case per thousand births. 
Prevalence is higher in social groups 
that have high rates of alcohol abuse. 
The problem is particularly acute in the 
Native American population, where the 
incidence of FAS has been estimated to 
be from 1.5 to 2.5 cases per thousand 
births. 

The effects can be devastating. FAS 
is characterized by facial malformation 
(short noses, small eye openings, thin 
upper lips, and skin folds over the eyes, 
among other features) and severe cogni¬ 
tive retardation. A broader term, fetal 
alcohol spectrum disorder (FASD), is 
used to refer to prenatal alcohol damage 
that includes all damage due to alcohol 
consumption during pregnancy. 

The National Institute of Health 
guidelines advocate complete abstinence 


during pregnancy. Damage can be 
inflicted early in the first trimester of 
pregnancy, so a woman might damage the 
fetus before becoming aware that she is 
pregnant. 

The fetus can be put at risk by drink¬ 
ing in the social range, below the level 
of intoxication. Anne Streissguth and her 
group at the University of Washington 
Medical School have conducted longitu¬ 
dinal studies of the children of women 
who either abstained completely or were 
social drinkers during their pregnancies. 
Among the latter, small but clear indi¬ 
cations of impulsive decision making and 
difficulties with attentional control were 
detected in their children when they were 
fourteen years old. The degree of impul- 
sivity varied with the amount drunk dur¬ 
ing pregnancy, even though none of the 
mothers would be considered to have 
been alcoholics.* 

General information on fetal alcohol syndrome was 
downloaded from wwTv.nlm.nih.gov/medlinepius/ 
fetalalcoholsyndrome.html#cati, June 2008, and 
links derived from this page. 

* Sampson et al., 1997; Connor et al. ; 2001. 


although a myriad of other causes can also 
affect maternal and fetal health. The opti¬ 
mal age for childbearing, at least as far as 
the risk of low birth weight is concerned, is 
from twenty-five to thirty-two; women out¬ 
side this age range are at heightened risk for 
premature birth and for having a low birth 
weight child. 

Not surprisingly, expectant mothers are 
strongly advised to avoid ingesting any sub¬ 
stance that is toxic to neural development. 
In our own society perhaps the common¬ 
est such substance is alcohol. Social drink¬ 
ing can have an effect, and maternal drink¬ 
ing to the point of serious intoxication can 
have devastating influences on fetal intellec¬ 
tual development. This is discussed in more 
detail in panel 9.4. 

We seem to be close to the situation 
we encountered with respect to molecular 


genetics; we know neonatal-infant environ¬ 
mental effects that can harm intelligence. 
Is there any way to improve intelligence, 
within the normal range? Some interesting 
things have been tried, but there is little 
objective evidence that various diets, exer¬ 
cise regimens, or even having the expec¬ 
tant mother listen to classical music (see 
panel 9.5) does very much good. The way 
to make sure children are born highly intel¬ 
ligent still eludes us. 

9.3.3. Nutrition 

Nutrition , in the broadest sense, refers to 
anything that people ingest as food or drink. 
This excludes certain substances that are 
ingested but not for the purpose of fuel¬ 
ing metabolism, such as the ingestion of a 
psychoactive drug. There are a few cases of 
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Panel 9 . 5 . The Mozart Effect 

There has been a good deal of inter¬ 
est in the possibility that listening to 
music - and for some reason, listening 
to the works of Mozart - might improve 
spatial-visual reasoning.* The claim has 
even been made that the effect can be 
observed in rats! The phenomenon has 
been dubbed the "Mozart effect." 

The first reports of the effect led 
to a crescendo of replications, and an 
exchange of letters on the topic in Nature 
in 1999. 1 My own belief is that the effects, 
if any, are small and transient, and are 
probably due to structured music tem¬ 
porarily stimulating some of the brain 


regions used to solve spatial-visual prob¬ 
lems. It seems unlikely but not impos¬ 
sible that a permanent enhancement of 
spatial-visual reasoning results. Edward 
Zigler, a highly respected specialist in 
developmental psychology, has criticized 
the fascination with the (alleged) effect 
as yet another search for a "quick fix” in 
early childhood education and support. 
Zigler worries that the attention focused 
on such fads distracts public attention 
from the development of less dramatic, 
more expensive, high-quality infant care 
and pre-school programs. 4 

* Rauscher, Shaw, & Ky, 1993. 

Chabris, 1999. 

1 Jones & Zigler, 2002. 


substances that have been ingested for nutri¬ 
tional purposes, but that contain pathogens 
that can produce a loss of intelligence. 
Panel 9.6 describes two such cases. 

Intuitively, it makes good sense to think 
that nutrition will influence intelligence, 
especially during periods of neural growth. 
It turns out to be difficult to prove this, 
for several reasons. We want to distin¬ 
guish between temporary effects, during 
a period of malnutrition, and permanent 
effects, exhibited following recovery from 
malnutrition. It is also necessary to distin¬ 
guish between the effects of brief periods 
of malnutrition (and when, in development, 
these periods take place) and the effects of 
chronic malnutrition. The type of malnutri¬ 
tion is also an issue. A diet may have ade¬ 
quate caloric intake and still be deficient in 
protein, iron, or other substances important 
for neural development. There is also the 
problem of distinguishing between cogni¬ 
tive and temporary attentional effects. Mal¬ 
nourished people are physically weak and 
have trouble concentrating. This may lead 
to underperformance both in a testing ses¬ 
sion and, more importantly, in any situa¬ 
tion involving cognitive demands over a long 
period of time. 


These are conceptual problems. There 
are also some major practical problems. 

It would not be ethical to subject peo¬ 
ple to malnutrition in a controlled study, 
so correlational and epidemiologic studies 
are necessary. Reliable records of nutritional 
intake may not be available. Collinearity is 
an issue. Malnutrition to a level that would 
influence neural development, and hence 
intelligence, is rare in the industrially devel¬ 
oped countries. When it does occur it is 
likely to be accompanied by poor general 
parenting practices and a lack of social sup¬ 
port for mother and child. In the develop¬ 
ing countries malnutrition does occur, but 
it tends to occur in the poorest areas of 
the country, those with the least access to 
schools and medical facilities. In general, 
more intelligent, better-educated mothers 
provide better nutritional environments for 
their children. 44 Genetic and social effects 
on the child's intelligence can appear to be 
due to nutrition unless care is taken to mea¬ 
sure the appropriate covariates. 

Nonetheless, the problem is an impor¬ 
tant one, so researchers have persevered. We 

44 For example, a study of Egyptian mothers by Wachs 

& McCabe, 2001. 
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Panel 9.6. Two Cases of Very Bad 
Dietary Practices 

In America and Europe today a great 
deal of publicity is given to dietary prac¬ 
tices. Most of that concern is directed 
toward obesity and cardiovascular health. 
There are some cases, though, in which 
the concern is overingestion of foods con¬ 
taining pathogens that may influence the 
brain. The famous Salem Witch Trials in 
seventeenth-century Massachusetts were 
a response to bizarre behavior by chil¬ 
dren. A case has been made that the 
behavior may have been due to a neu¬ 
rotoxic fungus in bread that the chil¬ 
dren ate.* But this is only speculation. 
Here are two better-documented exam¬ 
ples. One involves an exotic social prac¬ 
tice that, thankfully, is no longer with us. 
The other illustrates the dangers that can 
result when a pathogen gets into the vast, 
almost unseen food chain in industrially 
developed nations. 

Kuru is a disease that attacks the brain 
and central nervous system. It is caused 
by a slowly acting agent called a prion that 
destroys neural structures/ In the 1950- 
60 period kuru broke out in the high¬ 
lands of New Guinea, then one of the 
world’s most primitive areas. The out¬ 
break was due to cannibalism; people 
became infected by eating the brains of 
people who were already infected with 
kuru. The connection was not immedi¬ 
ately obvious because of the long period 


of time that typically elapses between 
infection and the manifestation of the 
disease. Today kuru has virtually disap¬ 
peared because of the modern Papua 
New Guinea government’s policy of 
strongly discouraging the practice of can¬ 
nibalism/ 

A related brain disorder, Creutzfeldt- 
Jakob disease, can be acquired by inges¬ 
tion of infected tissue from meat. (The 
risk of acquiring the disease is also 
partially under genetic influence.} The 
purity of meat is regulated by social prac¬ 
tices, especially the screening of cattle 
for the presence of a bovine form of 
the disease (“mad cow” disease] prior to 
slaughter and rendering into commercial 
beef/ The fact that infected meat could 
produce Creutzfeldt-Jakob disease was 
first discovered in Britain in 1996, fol¬ 
lowing a sharp rise in the incidence of 
the disease. Since then screening of cat¬ 
tle has greatly increased. Nevertheless, in 
2008 there were major political protests 
when the government of the Republic of 
Korea decided to permit importation of 
US beef although mad cow disease had 
been detected in the US herd. 

* Caporael, 1976. 

* Gajdusek & Zivas, 1957. Gajdusek received the 
Nobel Prize in Medicine in 1976 for the discovery 
of slow-acting prions. 

1 Information recovered from www.ninds.nih.gov/ 
disorders/kuru/kuru.htm, June 2008. 

$ Information recovered from www.ninds.nih.gov/ 
disorders/cjd/detaiI_qd.htrn, June 2008. 


will look at a few key studies. Most of these 
have involved infants or young children, as 
they are perceived to be most vulnerable to 
the effects of malnutrition. There are a few 
studies of the relation between nutrition and 
adult cognition. 

Brief periods of malnutrition do not 
appear to leave permanent effects, even 
when the malnutrition occurs in utero or 
in infancy. The evidence for this comes 
from a large, quasi-experimental study that 


occurred in the Netherlands as a byprod¬ 
uct of the battles in Western Europe dur¬ 
ing World War II. The events and the study 
are described in panel 9.7. The gist of the 
findings was that male children exposed to 
a few months of intense starvation during 
their infancy did not show any effect of that 
experience on cognition, when they were 
tested nineteen years later. Apparently the 
effects of severe short-term malnutrition are 
reversible. 




276 


HUMAN INTELLIGENCE 


Panel 9.7. The Hunger Winter Study 

A dramatic and tragic incident in World 
War II provided one of the best demon¬ 
strations that temporary malnourishment 
does not have a permanent effect on cog¬ 
nition. 

At the beginning of 1944 Western 
Europe was occupied by Nazi German 
forces. In June 1944 American and British 
troops, together with smaller Allied con¬ 
tingents, landed in France and began 
a steady march eastward toward Ger¬ 
many. In September 1944 British para¬ 
troopers attempted to seize bridges on 
the Rhine at Arnhem in the Netherlands. 
The plan was for the paratroopers to link 
up with British ground forces advanc¬ 
ing from the southwest. The combined 
forces would then cross the bridges into 
Germany. 

The Dutch railway workers went on 
strike, hoping to cut off supplies to the 
Germans. However, the British ground 
forces were unable to advance, and the 
paratroopers were overrun. To retaliate 
against the Dutch the Nazis imposed a 
transportation embargo on the northern 
Dutch cities under their control. Star¬ 
vation ensued. At one point the esti¬ 
mated caloric content of rations was 
down to 640 calories per adult per day. 


(Depending on age and size, the recom¬ 
mended US daily caloric intake is some¬ 
what over 2,000 calories.) Meanwhile the 
southern Dutch cities, under Allied con¬ 
trol, received adequate food. The starva¬ 
tion period lasted from September until 
March, when American troops seized 
a bridge across the Rhine at Remagen. 
Allied forces crossed rapidly into Ger¬ 
many, and the Dutch cities were relieved 
and supplied. The war in Europe ended 
shortly thereafter. 

In the Netherlands men register for 
military service at age nineteen. Reg¬ 
istrants are given a progressive matrix 
test. Zena Stein and her colleagues 
from Columbia University compared the 
scores of men who were either in utero 
or neonates during the four months of 
the starvation period to the scores of 
men who had been born at the same 
time, but in the cities under Allied con¬ 
trol. Other records of mental capabilities, 
such as the frequency of mental retarda¬ 
tion, were also compared across the two 
populations. There was no difference in 
any indicator of mental competence. This 
study of over 125,000 men provides strik¬ 
ing evidence of the resilience of human 
cognition.* 

* Stein et ah, 1972. 


Chronic, prolonged malnutrition is some¬ 
thing else. Numerous studies in develop¬ 
ing nations have provided evidence that 
prolonged malnutrition is associated with 
low test scores, particularly on nonverbal 
tests. A closer look reveals an interesting 
pattern of defects. Malnourished children 
are described as less attentive, impulsive, 
and easily distracted. Researchers primar¬ 
ily interested in nutrition describe this as 
a confounding variable, saying that it is 
unclear whether the effects of malnutri¬ 
tion are on intelligence, or whether they 
are on attention. 45 If we consider these 

45 Sigman & Whaley, 1998. 


observations in the light of studies of the 
role of basic information processing in intel¬ 
ligence, we reach a somewhat different con¬ 
clusion. 

The behaviors that characterize malnour¬ 
ished children are indications of lack of 
attentional control, which is a vital part of 
the working memory complex. Individual 
differences in working memory and the con¬ 
trol of attention are highly related to intelli¬ 
gence, especially measures of g and Gf. Pro¬ 
longed malnutrition also influences habitua¬ 
tion, which, as we have seen, is an indicator 
of infant intelligence. 

Viewed in this light, malnutrition directly 
influences intelligence test scores, because 
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the act of taking the test requires exer¬ 
cise of the working memory complex. The 
same deficiencies in information process¬ 
ing influence young children's school per¬ 
formance. This produces a deficiency in the 
knowledge-based aspects of intelligence. A 
physical agent, malnutrition, results in an 
inability to benefit from environmental sup¬ 
ports that improve cognition. 

This conclusion is reinforced by stud¬ 
ies of the beneficial effects of interven¬ 
tion. Two sources of intervention have been 
tried: improvements in general nutrition and 
improvements in intake of specific nutrients, 
primarily iron and protein. Iron is important 
because iron-deficiency anemia leads to gen¬ 
eral listlessness and, once again, inability to 
focus attention. Proteins are considered nec¬ 
essary for development of the neural system. 

The results of a study conducted in rural 
Guatemala are particularly informative. Pro¬ 
tein supplementation was provided for some 
children, while a nonprotein supplement 
was offered to others. Protein supplemen¬ 
tation improved various measures of cog¬ 
nitive performance. In addition, there was 
an important interaction. The greatest gains 
occurred when protein supplementation 
was combined with school attendance. 46 
This is consistent with the argument that 
appropriate nutrition will benefit working 
memory and related attentional processes. 
These then make it possible to take advan¬ 
tage of an environment that supports the 
development of cognitive skills. For optimal 
effect a program to help children recover 
from malnutrition should be combined with 
an educational program. 

The sorts of serious malnutrition that lead 
to cognitive deficits are generally not found 
in the industrially developed countries. In 
these countries excess caloric input, which 
can lead to a quite different set of problems, 
is more of an issue. When severe malnutri¬ 
tion occurs in the developing countries, it 
is usually because of a situation over which 
the individual family has little control - war, 
famine, or drought. The next physical envi¬ 
ronmental effect that we look at, alcoholism, 

46 Pollitt et al., 1993. 
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is a matter of lifestyle, and is found widely 
in Europe, the Americas, and in industrial 
Asia. 

9.3.4. Alcohol Abuse 

Alcohol is the commonest, and most abused, 
recreational drug in the world. We have 
already discussed the effects of mater¬ 
nal alcohol consumption on the fetus 
(panel 9.5). Here we concentrate on the 
effects on the consuming adult. 

While definitions of abuse have shifted 
over time and place, present psychiatric cri¬ 
teria distinguish two forms of alcoholism: 
alcohol dependence, in which the affected 
person can scarcely go without a drink, and 
alcohol abuse, which refers to people who 
consume alcohol frequently, and heavily, 
but can go for periods of time without drink¬ 
ing. According to the US National Institutes 
of Health (NIH) statistics for 2001-02, 75 
of every 1,000 adults over the age of eigh¬ 
teen, or 7.5%, fell into one of these two 
categories. (The figure includes "recovering 
alcoholics,” who are not currently drinking.] 
For reference, the incidence of heart dis¬ 
ease, the commonest cause of death in the 
US, is slightly greater than 8%. There is a 
genetic component to alcoholism; the close 
relatives of alcoholics are at heightened risk 
for alcoholism. 4 However, the genetic com¬ 
ponent is by no means a determiner. 

Alcoholism is diagnosed using four 
behavioral criteria. They are: 

Craving: A strong compulsion to drink. 
This goes well beyond thinking it 
would be nice to have wine with your 
meal. 

Loss of control: The inability to cease 
drinking once drinking is begun. 

Withdrawl symptoms: Nausea, sweating, 
shakiness, and anxiety when alcohol 
use is stopped. 

Tolerance: The need to drink ever greater 
amounts of alcohol in order to 
experience symptoms of intoxication, 
including relaxation and euphoria. 

47 McGue, 1999. 
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Brain imaging studies of alcohol abusers, 
compared to age-matched controls, have 
shown something that had long been sus¬ 
pected on behavioral grounds. 48 Prolonged 
alcoholism damages the frontal lobes 49 This 
is accompanied by decrements in abstract 
reasoning and in visual-spatial reasoning, 
the sorts of behaviors that one associates 
with decreased working memory, loss of 
the ability to plan actions, and diminished 
attentional control. 5 ° Extremely heavy, pro¬ 
longed drinking can produce damage to the 
limbic system, with an associated loss of the 
ability to store new memories (. Korsakoff's 
syndrome). Korsakoff’s syndrome patients 
are unable to function in society, and hence 
must be hospitalized. 

The results are clearly due to alcohol 
abuse, and not to pre-existing genetic or 
familial conditions, because the effects of 
excess use of alcohol can be demonstrated 
in twins, where one twin is an abuser and 
the other is not. 51 

The issue with regard to social drinking 
is less clear. Surveys of social drinkers indi¬ 
cate that there is a small negative associa¬ 
tion between moderate social drinking and 
cognitive test scores. Not surprisingly, the 
effect is most marked among those who con¬ 
sume alcohol more than four times a week, 
and who regularly consume more than 40-50 
ml of alcohol per occasion. This is roughly 
equivalent to three or four drinks of hard 
liquor, glasses of wine, or bottles of beer. 
There are marked individual differences in 
tolerance. The critical amount varies with 
weight and with sex, women generally being 
more sensitive. In terms of psychometric 
models, the effects appear to be on g or 
Gf, depending upon whether the g-VPR or 
three-stratum model is used to describe the 

48 There are many studies showing deficiencies in 
reasoning and intelligence associated with alcohol 
abuse. Clarke & Haughton, 1975, and Leckliter & 
Matarazzo, 1989, are typical examples. 

49 Moshelhy, Georgiou, & Kahn, 2001. 

50 See Schottenbauer et al., 2007, for a recent example 
of such a study and note how these results, using 
imaging technology, expand but do not alter the 
conclusions of a study by Jones (1971) thirty-five 
years earlier. 

51 Toomey et al., 2003. 


results. There is not enough data to make 
a clear statement about the effects of social 
drinking upon visual-perceptual reasoning. 52 

Collinearity makes it hard to define 
causality. The extent of social drinking 
varies greatly among different racial, ethnic, 
and demographic groups. In some circles 
social drinking is an accepted and (almost) 
expected practice. In others any use of alco¬ 
hol is frowned upon. It is difficult to dis¬ 
entangle the effects of social drinking from 
inherited intellectual potential, health prac¬ 
tices, and other lifestyle variables. There is 
also a chicken-and-egg problem: does heavy 
social drinking reduce intelligence, or is it 
the case that the intelligent person does not 
drink heavily? 

This is hard to say, because there is data 
indicating that childhood intelligence tests 
(i.e., measures of cognitive power taken 
before people begin to drink) predict drink¬ 
ing patterns. People with higher test scores 
are less likely to drink to the point of hav¬ 
ing a hangover, surely an intelligent thing to 
do, and are more likely to consume wine, 
which is generally drunk more slowly and 
more likely to be drunk with meals, than 
whiskey or beer. 55 This distinction is impor¬ 
tant, because the toxic effects of alcohol are 
related to a buildup of alcohol metabolites 
in the bloodstream. This occurs when alco¬ 
hol is taken in at a faster rate than it can be 
processed through the liver. When alcohol is 
taken with food, the rate of absorption from 
the stomach to the bloodstream is reduced, 
thus lessening the influence of the drug. 

While admitting that the issue is not 
clear, the evidence favors the hypothesis 
that repeated heavy drinking, to the point of 
feeling somewhat “high,” although not nec¬ 
essarily to the point of losing consciousness 
or marked motor control, does lead to cog¬ 
nitive deficit. 

The effects of alcohol on intelligence are 
extremely important on a population basis. 
According to a Center for Disease Control 

52 See Parker, Parker, & Harford, 1991, for a review and 

discussion of this research. 

55 Batty, Deary, & McIntyre, 2006; Mortensen, 

Sorensen, & Gronbaek, 2005. 
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survey taken in 2006, slightly more than 60% 
of US adults twelve or over described them¬ 
selves as current drinkers. About 22% of the 
adults surveyed reported at least one inci¬ 
dent of binge drinking, which was defined 
as five drinks or more per occasion. This 
is well over the danger level suggested by 
the literature. Binge drinking was heaviest 
among people eighteen to twenty-five (43% 
of the group), and within this group heaviest 
among college students. The famed (or infa¬ 
mous) weekend fraternity bash is the most 
dangerous mode of consumption a social 
drinker can engage in. 

We now turn our attention from what we 
put into our mouths to what we breathe in 
from the air. 

9.3.5. Atmospheric Lead 

Modern environmentalists worry a great 
deal about industrial pollutants in the air, 
water, and soil. The problem is not new. 
Mercury nitrate, a neurotoxin, was used in 
hat making until the twentieth century. A 
century earlier the phrase “mad as a hatter’ 
was in common use, for hat makers were 
thought to have delusions and poor motor 
control. Today mercury’s dangers are well 
known, and industrial exposure is tightly 
regulated. There is considerable concern 
over the presence of trace amounts of mer¬ 
cury in the food chain, especially in cer¬ 
tain fish. A much commoner pollutant, lead, 
presents a larger problem, worldwide. 

Lead was one of the first metals to be 
mined, for reasons ranging from its mal¬ 
leability to its taste (I). 54 Analyses of human 
remains show that prior to the beginning of 
metallurgy, about 4000 BCE, the concentra¬ 
tion of lead in the human body was .0016 
micrograms per deciliter (pg/dL). In 1975— 
80 the concentration in American children 
was estimated to be 15 pg/dL, over 1,000 
times the natural level. The concentration 
has fallen markedly since then, due largely 
to the banning of tetraethyl (“lead added”) 
gasoline. 55 

54 The Romans used lead sulfate to sweeten wines. 

55 Hubbs-Tait et al., 2005. 


The controversy over, and eventual ban¬ 
ning of, lead additives in gasoline did not 
represent a newfound concern over lead. 
Debates over the costs and benefits of using 
the metal can be traced back to the time of 
the Roman Empire. Panel 9.8 presents some 
of the history. It shows the difficulty of strik¬ 
ing a balance between the undeniable eco¬ 
nomic benefits of an activity and the equally 
undeniable cost of risks to public health. 

The earlier controversies were over the 
effects of the levels of exposure that might 
be experienced in an industrial operation 
using lead. Today's concerns are over the 
cumulative effects of exposures to much 
lower concentrations of atmospheric lead, 
and in particular the effects upon children. 
These concerns were heightened by results 
reported by Herbert Needleman of the Uni¬ 
versity of Pittsburgh in 1979. 56 Needleman 
realized that a large-scale epidemiological 
survey of lead concentrations in children's 
bodies could be conducted, in a highly 
noninvasive manner, by collecting young 
schoolchildren’s “baby teeth” after they fell 
out and analyzing them for lead content. He 
then contrasted the Wechsler test (WISC- 
R) performance of first and second grade 
children with high and low concentrations 
of lead in their teeth. Higher concentra¬ 
tions were associated with low IQ scores. 
In a second study the level of lead in the 
body was positive correlated with teach¬ 
ers’ reports of children’s impulsive behav¬ 
ior. The researchers allowed for the effects 
of a number of control variables, including 
parental SES. On its face this was an impres¬ 
sive study of over 2,000 children. 

Needleman made the dramatic claim that 
lead concentrations could be responsible for 
a six- to seven-point IQ drop in children. 
Given the substantial economic implica¬ 
tions of this assertion, it is hardly surpris¬ 
ing that his work was challenged. Some of 
these challenges amounted to accusations 
of improper manipulation of data. These 
charges were investigated, and Needle¬ 
man was exonerated of any wrongdoing. 

5 6 Needleman et al., 1979. 
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Panel 9.8. The History of Concerns 
over Lead Poisoning 

Lead has been used since antiquity. Its 
widespread use was accompanied by 
unheeded warnings. 

The Romans used lead in dishes and 
in water pipes, in spite of the architec¬ 
tural engineer Vitruvius’s warning that 
lead could ‘"rob the limbs of the virtues of 
the blood.” Seventeen hundred years later 
Benjamin Franklin gave a similar warning. 
Franklin, a shrewd observer of both phys¬ 
ical and social realities, said that his warn¬ 
ing would be ignored because of the eco¬ 
nomic benefits of using the metal. In the 
early twentieth century Alice Hamilton, 
M.D., the first woman admitted to the 
Harvard faculty, documented over 500 
cases of lead poisoning. She observed that 
her findings were consistent with those of 
French authorities one hundred years ear¬ 
lier. The public health problem became 
acute with the advent of the automobile. 

Tetraethyl lead in gasoline is a cost- 
effective anti-knock agent. The first 
refineries for leaded gasoline were devel¬ 
oped after World War I by a cooperative 
effort of General Motors, Standard Oil 
Company (now Exxon), and DuPont. In 
1924 there were outbreaks of insanity, ill¬ 
ness, and death in two of the new refiner¬ 
ies. Various controversies, investigations, 
and court cases occurred from the 1920s 
through the 1940s. The automobile indus¬ 
try avoided being regulated, although 
the evidence against lead mounted and 
mounted. The debate was bitter. Alice 
Hamilton personally confronted Charles 
Kettering, General Motors vice president 
for research (and a famous automotive 
engineer), and called him a murderer! 

In the 1970s legislation was passed to 
remove lead from gasoline, due to the 
perceived public health hazards. This 
decision cost the automotive industry bil¬ 
lions of dollars, a cost that was promptly 


passed along throughout the economy. 
This did not clear the air, for the lead 
produced by automobile combustion is 
only part of the problem. Many industrial 
processes produce atmospheric lead as a 
by-product. Removal costs can run into 
millions of dollars. Therefore, there was 
further debate over whether other pro¬ 
cesses, including lead smelting, produced 
a sufficient health hazard to justify the 
economic costs of controlling emissions. 
The consensus now is that they do, and 
in the developed industrial countries the 
emission of lead into the atmosphere and 
the use of lead in home products are both 
strictly regulated. 

Nevertheless, many products used in 
the home are sources of lead. These 
include paints (especially older versions, 
still found in many homes), polyethylene 
plastic bags, and even candy wrappers. In 
November of 2007 some Chinese-made 
toys intended for the US Christmas mar¬ 
ket were found to contain unacceptable 
levels of lead. Senator (later President) 
Barack Obama, then involved in a tight 
race for the Democratic nomination for 
president, made the sweeping statement 
that if elected he would bar the impor¬ 
tation of toys from China. Obama subse¬ 
quently retreated from this promise, pos¬ 
sibly because 80% of US toys are made in 
China. 

The laws regulating lead present costs 
to industry that may reach into the 
billions of dollars. On the other side of 
the coin, the CDC has estimated that 
in the early 2000s 1.1% of US children 
under the age of five had blood lead 
concentrations higher than the current 
allowable level of lopl/dl. As of 2007, the 
Census Bureau estimated that there were 
20.75 million children in this age bracket. 
That comes to 228,250 children who may 
be building up dangerous concentrations 
of lead. This is a significant health risk. 
The problem is confounded by the fact 
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that lead sources tend to be located in 
poorer residential areas, close to indus¬ 
trial operations. 

It would be impossible, and inap¬ 
propriate, for a book on intelligence to 
attempt to weigh the relative costs and 
benefits of lead use, or to comment on the 


costs and benefits of similar industrial- 
public health trade-offs. What we can do 
is examine the research that has caused 
us to be concerned. 

The information in this summary comes from 
Hubbs-Tait et ah, 2005, and Kovarik, 2005. 


Collinearity posed a more difficult problem 
in interpretation. Was the poor test perfor¬ 
mance of children with higher lead con¬ 
centrations caused by the concentration of 
lead, or did they perform poorly because 
they tended to come from poorer homes 
and/or to be minority group members, and 
therefore would be expected, on statistical 
grounds, to have lower scores on intelligence 
tests? The best way to investigate this issue is 
by replication of a result, in situations where 
the collinearity problems plaguing the orig¬ 
inal finding do not occur. 

Several such replications have been con¬ 
ducted. One is a longitudinal study of chil¬ 
dren living near industrial operations in 
Port Pirie, Australia. 57 A second longitudi¬ 
nal study following up children from two 
to seven was carried out in Kosovo in the 
1990s. 58 In both these studies investigators 
obtained maternal IQ scores and took exten¬ 
sive measures of the home environment. In 
the Kosovo study two separate populations 
were studied, children living in a relatively 
large town near a lead smelter and chil¬ 
dren in a similar town, twenty-five miles 
away, where the atmospheric lead expo¬ 
sure level was much lower. Pregnant moth¬ 
ers were enrolled, prenatal lead concentra¬ 
tions were estimated, and children were fol¬ 
lowed up until they were seven in order 
to assess the effects of a buildup of lead 
in the body. Both prenatal levels of lead 
and postnatal increases were associated with 
drops of intelligence after all other vari¬ 
ables were considered. There was no evi- 

57 Baghurst et al. f 1992. 

5 8 Wasserman et al. f 2000. The study was carried out 
at a time when Kosovo was part of Serbia, and was 
not disrupted by war. 


dence of threshold effects, which provides 
an argument against maintaining a permissi¬ 
ble concentration level of 10 pm/dl, as is now 
done. 

Lead concentrations were statistically 
associated with just over 4% of the variance 
in children’s IQ test scores. Similar results 
have been obtained in other studies in the 
US and in South America, so we have a 
highly generalizable finding. 59 

In 2005 a consortium of researchers 
involved in these studies published an analy¬ 
sis of the international findings. 60 They con¬ 
cluded that there is a non-linear relationship 
between intelligence test [IQ] scores and the 
level of lead in the blood. Their guide was 


Increment in lead 

Expected loss 

in blood 

in IQ points 

2.4 to 10 yd dL 

3-9 

10 to 20 p/dL 

1.9 

20 to 30 yddL 

1.1. 


The losses are cumulative, so the expected 
loss for a child with a 30 yd dL concentration 
of lead in the blood would be 6.9 IQ points, 
Needleman’s original estimate. 

A drop of five to seven IQ points is not 
serious on an individual basis. We would not 
expect to find a great deal of difference in 
the cognitive capabilities of two children, 
one with an IQ of 100 and the other with an 
IQ of 95. On a population basis, though, this 
would be a serious issue, because of changes 

59 Wasserman et al., 2000, p. 815. See related work in 
the United States by Chiodo et al. (2007) and in 
South America by Counter, Buchanan, & Ortega 
(2005). 

60 Lanphear et al., 2005. 
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Table 9.1. Levels of blood lead concentration in children, together with recommended 
actions 


Class 

Blood Lead 
Concentration 
( V-g/dl) 

Comment 

I 

Less than 10 

Not lead poisoned. 

IIA 

10-14 

Children in Class IIA should be screened frequently. If many 
children in a community test in this range, community-wide 
preventive measures should be taken. 

IIB 

15-19 

A child in Class IIB should receive nutritional and educational 
interventions and more frequent screening. Environmental 
investigations and interventions should be initiated. 

III 

20-44 

A child in Class III should receive a medical evaluation. The child 
may require pharmacological treatment for lead poisoning. 
Environmental evaluation and remediation is called for. 

IV 

45-69 

A child in Class IV requires medical and environmental 
intervention, including chelation therapy (a technique for removing 
lead from the body). 

V 

70 or above 

The child is suffering from lead poisoning. This is a medical 
emergency. Medical intervention and environmental management 
must begin immediately. 


Note: Information provided by the US Center for Disease Control brochure “Preventing Lead 
Poisoning in Young Children/’ published in October 1991. The commentary has been paraphrased 
from comments in the brochure. 


in the frequency of exceptionally high and 
low levels of intelligence. Consider a pop¬ 
ulation of a thousand children. If the mean 
IQ in the population were ioo ; we would 
expect to find approximately fifty children 
with IQs below 70, which is often consid¬ 
ered an indication for enrollment in a special 
education program, and an equal number of 
children with IQs above 130, in some pro¬ 
grams a marker for entry into a gifted educa¬ 
tion curriculum. If the mean of the popula¬ 
tion dropped to 95, we would expect to find 
about one hundred children with IQs below 
70, and only twenty-five with IQs above 130. 
In other words, a drop of five IQ points in 
the population average would be associated 
with a doubling of the number of children 
in the special education program, while the 
number of children eligible for the gifted 
program would drop by a half. 

These research findings have had an effect 
on public policy, at least in the industrial and 


postindustrial countries. In i960 the border 
between safe and unsafe exposure levels was 

60 p/dL, twice the level used as a point of 
concern in the 2005 comprehensive review. 
Table 9.1 shows the current guidelines from 
the US Center for Disease Control (CDC). 
Actions to reduce lead concentrations are 
now recommended when children’s blood 
levels exceed 10 p/dL, and a blood level 
above 45 p/dL is seen as a medical emer¬ 
gency. An advisory committee report pub¬ 
licized by the CDC has indicated that con¬ 
centrations higher than 10 p/dL can warrant 
actions to reduce the levels of lead in the 
home or near schools.^ 1 

Unfortunately, significantly higher levels 
of atmospheric lead are found in some devel¬ 
oping countries. For instance, in the indus¬ 
trial port of Callao, near Lima, Peru, lead 
storage areas are located near one of the 

61 


CDC Advisory Committee, 2007. 
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poorest residential areas of the city. A survey 
found that approximately half of the early 
elementary school children had blood lead 
levels of 20 \U dL or higher. 62 

Lead in the atmosphere clearly represents 
one of the major threats to the development 
of intelligence in some areas of the world 
today. 

9.4. The Social Environment 

The social environment is an important 
determiner of cognitive power. Charles 
Murray has called the development of cer¬ 
tain ways of thinking “meta-inventions/' 
because societies and individuals who use 
these ways of thinking have a huge advan¬ 
tage over those who do not. 6 * Literacy 
is probably the most important of the 
meta inventions, because it fosters abstract 
thinking and provides continuity with the 
past. Mathematical reasoning and scientific 
approaches to problem solving also improve 
thinking. In our society the intelligent per¬ 
son is one who has a good grasp of these 
tools of thought. 

Americans spend a good deal of time and 
effort trying to ensure that their children 
become familiar with the tools of intelli¬ 
gence. The effort literally begins at home. 
The Disney Corporation’s Baby Einstein 
DVD programs for three-month- to three- 
year-olds are supposed to enhance intelli¬ 
gence in toddlers. Whether they do so is 
questionable. 64 However, viewing quality 
children’s television programs, such as the 
Sesame Street series, does improve cognitive 
skills in pre-schoolers. 65 The United States 
spends more money per student on K-12 edu¬ 
cation than any other country. Nevertheless, 
panels of business and government leaders 
regularly decry the (alleged) fact that Amer¬ 
ican schools are failing to produce people 
who can solve problems and face new cog- 


62 Guerrero, 2009, Table 02. 

65 Murray, 2003. 

64 Zimmerman, Christakis, & Meltzoff, 2007. 

65 Anderson, 1998. 


nitive challenges - the very definition of fluid 
intelligence. 

The problem is not to show that the social 
environment influences the development of 
intelligence, the problem is to find out how 
it does so. This is not easy. 

Social variables are easy to name, but hard 
to define. “Good parenting” is something we 
all applaud, but just exactly what is a good 
parent? Socioeconomic status (SES) is real. 
But how do you measure it? Income is often 
used as a proxy. In 2008 the Chief Justice of 
the United States, John Roberts, had a salary 
of $217,000. Alex Rodriguez, third baseman 
for the New York Yankees baseball team, 
had a salary of $28,000,000. Who had higher 
socioeconomic status, the chief justice or the 
third baseman? 

Measurement is not the only problem. 
As has been pointed out earlier, we do not 
have a theory of the social environment that 
approaches the clarity of Mendel’s theory 
of genetic inheritance. Nor do we have any 
broad theoretical approach to environmen¬ 
tal issues that can play a role similar to the 
role that Darwin’s theory of evolution has 
in biology. Lack of a comprehensive theory 
of the environment has resulted in a great 
many ad hoc studies without a great deal 
of accumulated knowledge. The lack of an 
adequate theory has also made it hard to 
understand the relations that we do observe. 
Social variables are highly collinear with 
each other, with genetic measurements, and 
with measurements of the physical environ¬ 
ment. Nutrition is linked to socioeconomic 
status, especially in the developing world. 
Parents who produce favorable home envi¬ 
ronments for their children are likely to seek 
out the best school environments. In modern 
developed countries (and in urban districts 
worldwide) residence is closely tied to SES 
and sometimes to ethnic status. This con¬ 
strains the composition of children’s play 
groups. Without a theory it is difficult to 
develop models that can guide the selection 
of variables to study. 

These are real problems. Nevertheless, 
psychologists and educators have learned 
something about social environments that 
nurture or restrict intelligence. 
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9.4.1. Socioeconomic Status and 
Intelligence 

Socioeconomic status (SES) refers to the 
obvious, but nebulous, concept that there 
are classes. There is no generally agreed- 
upon definition of what a class is, so the 
idea is clearly a fuzzy one. The commonest 
practice is to define anywhere from three 
to six different classes, based upon com¬ 
binations of income, education, and occu¬ 
pational prestige. It is not unusual to find 
studies in which a single variable, such as 
income or education, is used as a proxy for 
SES. Given the variety of definitions, it is 
a bit surprising to find that SES is corre¬ 
lated in the .30-.40 range with performance 
on such diverse variables as the WAIS tests, 
the SAT, and Raven’s progressive matrix 
tests. 66 However, the direction of causality 
is far from clear, and it is probably bidi¬ 
rectional. There is a correlation of slightly 
over .40 between parental SES and SAT 
scores. 6 ^ The same thing is true for other 
measures of intelligence, such as IQ tests 
and academic performance. 68 These findings 
suggest that SES is causal to intelligence. 
At the same time, test scores obtained in 
adolescence correlate with a person’s own 
SES roughly twenty years later, which is 
consistent with the argument that intelli¬ 
gence causes SES. 69 As a further compli¬ 
cation, test scores and SES are both cor¬ 
related with parental advantages of various 
sorts, including parental, and hence one's 
own, genetic constitution. Parental advan¬ 
tage, carried forward across the generations, 
could, at least in theory, be a cause for 
one’s own socioeconomic success. And just 
to make things even more confusing, cog¬ 
nitive tests such as the SAT are used as 
screening devices in education, raising the 
possibility that measures of intelligence act 
as a gatekeeper for access to resources that 
determine social success, but that intelli¬ 
gence itself has little causal influence. 


66 Ceci & Williams, 1997; Raven, 1989. 

67 Sackett et al., 2009. 

68 Teasdale & Owen, 1986; Zwick & Green, 2007. 

69 Herrnstein & Murray, 1994. 


The problem is that SES is too global 
a measure of either parental influence or 
one's own success in life. In order to under¬ 
stand the influence of social class upon intel¬ 
ligence we have to take a finer look at 
phenomena that underlie the correlation. 
Three classes of studies have been used to 
evaluate the effects of the social environ¬ 
ment: adoption studies, studies involving 
social interventions, and multivariate anal¬ 
yses of specific features in the environment, 
We consider each of them in turn. 

9.4.2. Adoption 

As we saw in Chapter 8, adoption studies 
are frequently cited as supporting a genetic 
cause for intelligence, on the grounds that 
correlations between measures of cogni¬ 
tion in adoptees and their biological parents 
are higher, and usually substantially higher, 
than the correlations between adoptees and 
their adoptive parents. A different statis¬ 
tic, the mean intelligence test scores of 
adoptees, can be cited in support of envi¬ 
ronmental influences. 

An early, much-cited British study in the 
1930s produced what turned out to be fairly 
typical results. 70 The biological parents were 
all “working-class," as defined in Britain at 
that time. The mothers, as a group, had an 
estimated mean IQ of 86. The adopting par¬ 
ents were all described as well educated. 
Given this information, one would expect 
the adoptees to have IQ scores in the 90s, 
somewhat higher than the mothers’ IQs, 
but still below the population mean of 100. 71 
This is not what happened. The mean IQs 
were 117 at age two, and 108 at thirteen. This 
is evidence for an effect of the childhood 
home environment. However, the correla¬ 
tions between the biological mother’s level 
of education and adoptee's IQ scores were 

70 Skodak & Skeels, 1949. 

71 The reason for this is a statistical phenomenon 
known as “regression toward the mean.” Whenever 
an extreme score is observed on a test of less-than- 
perfect reliability, the best estimate of the score that 
would be obtained upon retesting, under exactly the 
same conditions, would be a score between the orig¬ 
inal extreme score and the population mean. 


ENVIRONMENTAL EFFECTS ON INTELLIGENCE 


285 


Table 9.2. Mean IQ scores of adopted children as 
pre-schoolers and as adolescents, compared to scores 
by biological children of the adopting family 

Group Time 1 19 j6) Time 2 (^1986) 


Biological children 116.4 109.4 

Adopted African American 106.1 98.1 

Adopted White 117.6 105.6 


Source: Data excerpted from Weinberg, Scarr, & Waldman, 1992, 
Table 2. 


.04 at age two and .31 at age thirteen, reflect¬ 
ing what was to be a typical finding in later 
studies: measures of genetic influences upon 
intelligence rise as children grow older. 

At this point, it may be helpful to look 
back to Figure 9.1. If we substitute “adoptive 
home” and “biological home” for “encourag¬ 
ing environment” and “restricting environ¬ 
ment,” the figure illustrates how an environ¬ 
mental effect upon mean scores could occur 
along with a correlation between test scores 
and measures of genetic potential. 

We then “fast forward” almost forty years, 
to the Minnesota Trans-Racial Adoption 
(MTRA) study conducted in the 1970s in the 
American Midwest. The MTRA study con¬ 
trasted the test scores of African American 
and White children who had been adopted 
into upper-middle-class White homes. Both 
adoptees and biological children of the 
adoptive family were tested, first when 
the adoptees were pre-schoolers and then 
ten years later, when the adoptees were 
adolescents. 72 Table 9.2 shows the results. 
Biological children consistently outscored 
adopted children in the transracial but not 
in the intraracial groups. All test scores 
declined somewhat from childhood to ado¬ 
lescence. This could be caused by a vari¬ 
ety of factors, including the fact that the 
tests used differed somewhat over time. The 
African American adoptees had an aver¬ 
age score near or above 100 (the putative 
national mean) and well above the score of 
85 typically found in African American pop¬ 
ulations on this type of test. 

72 Scarr & Weinberg, 1976; Weinberg, Scarr, & 

Waldman, 1992. 


The practice of comparing the adoptees’ 
obtained scores to maternal scores or to 
population expectations can be criticized. 
Suppose that the birth mother’s IQ is 90. 
This does not mean that a child would 
be expected to have an IQ of 90, for 
two reasons. No allowance has been made 
for father's intelligence, which is typically 
unknown. In addition, regression to the 
mean (see footnote 71) implies that the 
child’s IQ will be closer to the appropriate 
population mean than is the mid-parent IQ. 
But what is the appropriate mean? Moth¬ 
ers who give up their children for adoption 
are hardly a randomly selected group of all 
women, or even of all women in the appro¬ 
priate racial, ethnic, or educational group. 
A better way is needed to estimate the 
expected IQ of adopted children. 

One way of doing this is to compare the 
cognitive performance of adoptees to the 
cognitive performance of children who have 
not been adopted, but who might have been. 
The test scores of adopted children can be 
compared either to those of unadopted sib¬ 
lings or to unrelated, unadopted children in 
the same pool of potential adoptees. When 
this is done the results show a positive adop¬ 
tion effect of slightly less than one stan¬ 
dard deviation unit, both for test results 
and for measures of school performance. 73 
This is probably an overestimate of the 
adoption effect, because the comparison 
includes the effects of any tendency lead¬ 
ing to adoption of apparently more favorable 
children. However, given that most children 
are adopted as infants, and that standard 

75 Van Ijzendoorn, Juffer, & Poelhuis, 2005. 
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Table 9.3. Mean WISC-R scores as a function of SES of birth and 
adopting parents. The number in parentheses is the number of cases. 

Adopting Parents Adopting Parents 

High SES Low SES Average 


Birth parents high SES 119.6 (10) 107.5 (8] 114.2 

Birth parents low SES 103.6 (10] 92.4 (10) 98.0 

Average 111.6 99.1 


Source: Based on data reported by Capron & Duyme, 1989. 


developmental inventories do not do a good 
job of predicting later intelligence, this effect 
is likely to be small. Except for pathological 
cases, adoption agencies would have a diffi¬ 
cult time identifying infants who were going 
to be bright or dull fifteen years later. 

Understandably, most studies stress the 
positive effects of adoption. Negative effects 
are conceivable. It should be possible to 
move intelligence upward or downward by 
adoption, depending upon the relative SES 
of the biological and adopting parents. Such 
an analysis requires an unusual situation, for 
high SES parents are less likely to give up 
their children than are low SES parents, and 
adoption agencies are more likely to place 
adoptees with high rather than low SES 
families. However, the unusual does occur. 
French researchers were able to locate a rela¬ 
tively small number of young children (eight 
to ten per cell) in a study that did fit the 
appropriate design. 74 Table 9.3 shows the 
IQ scores achieved on the Wechsler Intel¬ 
ligence Scale for Children (Revised) at an 
average age of fourteen years. 

Taken at face value, the data in Table 9.3 
indicates a high birth SES-low birth SES 
effect of about 16 IQ points, and a high 
adopting SES-low adopting SES effect of 
12.5 points. In other words, the data is con¬ 
sistent with both hereditarian and environ¬ 
mental effects on intelligence, and makes 
the important point that the two causes are 
not mutually exclusive. I think it would be 
unwise to go much beyond this, for the size 
of the effects certainly should not be gener¬ 
alized. The study is a small one, and it is not 

74 Capron & Duyme, 1989, 1996. 


clear that the difference between the adop¬ 
tive high and low SES groups was equivalent 
to the difference between the high and low 
SES birth parents. 

The results of adoption studies have been 
used to argue for both environmental and 
genetic influences on intelligence. People 
who want to emphasize genetic causes cite 
the fact that indices of adoptees' cognitive 
competence are better predicted by bio¬ 
logical parent's competence than by adop¬ 
tive parent's competence as evidence for the 
importance of genetics. People who want to 
emphasize environmental causes cite gains 
in intelligence achieved by adoptees. The 
debate over how to interpret these find¬ 
ings can be heated. In some of the studies 
put forward to support the genetic posi¬ 
tion the authors do not report changes in 
mean scores, while in studies put forward 
to support the environmental positions the 
authors do not report parent-adoptee cor¬ 
relations. This practice more resembles the 
behavior of a lawyer presenting the evidence 
for a client than the behavior of a scientist 
reporting data to be considered in evaluating 
theories. 

In fact, there is no conflict between the 
results. Erik Turkheimer, of the Univer¬ 
sity of Virginia, offered an analysis that 
brings both these results into the same 
framework. 75 His analysis was based upon 
the concept of reaction range, as discussed 
in Chapter 1, section 5, and illustrated in 
Figure 9.1 of this chapter. Turkheimer devel¬ 
oped a mathematical model that separates 
the genetic and environmental effects in 
circumstances such as those illustrated in the 

75 Turkheimer, 1991. 
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figure, and used it to reanalyze a study con¬ 
trasting adopted and nonadopted siblings. 76 
The original authors had concluded that 
there was a major effect of social class, on 
the grounds that adoptees had higher intel¬ 
ligence scores than their nonadopted sib¬ 
lings, and that the adopting parents had 
higher occupational status than the biologi¬ 
cal parents. According to Turkheimer’s anal¬ 
ysis, which treated the within-group and 
between-group effects in a single frame¬ 
work, there was no reliable effect of a direct 
measure of adoptive SES - father’s occupa¬ 
tional status - but there was still a large (and 
unexplained) effect in favor of the adopted 
children. How could this occur? 

While various explanations of this finding 
have been offered, I believe that the ambi¬ 
guities in studies like this will not be cleared 
up until we take a finer look at environ¬ 
mental measures. Variables like “adoption,” 
“socioeconomic status,” and even paternal 
intelligence and occupation are distal vari¬ 
ables with respect to the development of 
intelligence. Nutrition, schooling, and par¬ 
enting practices are proximal variables, act¬ 
ing directly upon intellectual development. 
What we need to do is to look at some prox¬ 
imal effects. 

9 . 4 . 3 . The Home Entnronment 

Parenting practices are evaluated by rating 
homes on such things as the general order¬ 
liness of the home, the amount of read¬ 
ing or other explicitly educational material 
available, the number and quality of interac¬ 
tions between parents and children, and the 
extent to which children are encouraged to 
work out the answers to questions and puz¬ 
zles, as opposed to being told how to do 
so. It appears that the best environment for 
intellectual development is one in which the 
child is encouraged to work out problems 
with guidance and support from parents, as 
opposed to an authoritarian setting in which 
the parent tells the child what to do, or 
a laissez-faire setting in which the child is 
pretty well left on his or her own. Parenting 

76 Schiff & Lewontin, 1986. 


styles, and especially substandard parent¬ 
ing practices, are statistically associated with 
indices of socioeconomic status and family 
solidarity, such as income and whether the 
child is in a one-parent, father-absent, or 
conventional mother-father home. 77 These 
variables are important, for the home envi¬ 
ronment has a great deal to do with intellec¬ 
tual development in young children. This 
has been shown by two interesting lines of 
research. 

The first line involves observation of the 
influence of the home environment upon 
measures of children’s intelligence, includ¬ 
ing but not limited to test performance. 
In one such study Victoria Molfese and 
her colleagues at Southern Illinois Univer¬ 
sity followed 121 children, none of whom 
had experienced extreme neonatal or birth 
risks, from age three to age five. 78 The 
children were given intelligence tests annu¬ 
ally. Family SES was determined by com¬ 
bining indices of education, occupation, 
and income. The home environment was 
determined by observation and rating, using 
the widely accepted Home Observation for 
Measurement of the Environment (HOME) 
scale, which requires observation of the 
child’s home, including the provision and 
use of reading material for children and 
reports of the manner in which adults inter¬ 
act with the child. 79 Figure 9.5 shows the 
combined and independent contributions of 
SES and HOME ratings to the prediction 
of children's WISC scores. While there are 
some irregularities, there is a trend toward 
a decreasing influence of home environ¬ 
ment and increasing influence of SES as 
the children aged. This is consistent with 
behavior genetic studies that show generally 
increasing influences of genetic heritage and 
lowered influence of home environment 

77 Kotchick & Forehand, 2002. Kotchick and Forehand 
make the interesting point that if a family lives in a 
potentially threatening environment, an authoritar¬ 
ian, controlling style of raising young children may 
be adaptive, even though it does not foster intellec¬ 
tual development, because of the need to protect 
the child. 

Molfese, DiLalla, & Bunce, 1997. This article also 
contains references to a number of related reports. 
7 ? Bradley, 1993. 
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Figure 9.5. The fraction of variance in WISC scores predictable 
from measures of SES and home environment. For reference, a 
fraction of .04 is equivalent to a correlation of .2. Calculations 
based on data in Molfese, DiLalla, & Bunce, 1997, Table 2. 


upon intelligence as people age, albeit over 
a much greater time span than was the case 
in the Molfese and colleagues study. 

Two more studies illustrate the wide¬ 
spread influence of the home environment 
during the early years. Both results come 
from studies previously cited to make points 
about the physical environment: observa¬ 
tions of rural Filipino children and observa¬ 
tion of children growing up in Kosovo, in the 
Balkans. In the Philippine study, used earlier 
to indicate the relation between nutritional 
status and intelligence, ratings of home envi¬ 
ronments similar to the HOME scale added 
to the prediction of a child’s intelligence, 
after allowance for a variety of other vari¬ 
ables, including maternal intelligence test 
score, paternal education, and physical envi¬ 
ronmental variables. 80 In the Kosovo study 
of atmospheric lead, the HOME score had 
the highest correlation with children's intel¬ 
ligence test scores of all the variables consid¬ 
ered, which included measures of maternal 
intelligence (Raven matrix score) and mater¬ 
nal education. 81 

The American Midwest, the rural Philip¬ 
pines, and the Balkans are very different 
places. When relations between any variable 
and intelligence are consistently related over 

80 Church & Katigbak, 1991, Table 6. 

81 Wasserman, 2000. 


such diverse settings, one has to pay atten¬ 
tion. The microenvironment of the home 
has a substantial influence on intelligence 
during the early childhood years. A good 
home environment gives a child a leg up 
as he or she enters school. And, as will 
be shown shortly, schooling counts as well. 
Things mount up. 

9 . 4 . 4 . Early Childhood Interventions 
in “at Risk” Populations 

There have been substantial attempts to 
improve the environments of children who 
are thought to be “at risk” for showing 
poor cognitive development. Since the 1960s 
the US Government has funded Head Start 
programs for pre-schoolers from low SES 
families. Head Start was motivated by a con¬ 
cern that low SES, and particularly African 
American, children entered primary school 
lacking a number of cognitive and social 
skills that were thought to be important in 
children’s adjustment to the school experi¬ 
ence. These programs typically involve pre¬ 
school training for a half-day, five days a 
week, for the year prior to entering school. 
Many Head Start programs also include 
some form of parental education, as it is felt 
that inadequate parental support may be a 
factor leading to the children's poor perfor¬ 
mance in schools. 
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Progress toward the goal of improving 
cognition has been spotty, at best. In 1969 
Arthur Jensen, a Professor of Education at 
the University of California, Berkeley, pub¬ 
lished a controversial paper in the Harvard 
Educational Review. He began 

Compensatory education has been tried 
and apparently it has failed. 

Jensen, 1969, p. 2 

Later in the article he said 

The evidence so far suggests the tentative 
conclusion that the payoff of preschool and 
compensatory programs in terms of IQ 
gains is small. 

And further down on the same page: 

The techniques of raising intelligence per se 
in the sense of g, probably lie more in the 
province of the biological sciences than in 
psychology and education. 

Jensen, 1969, p. 108 

Jensen maintained this position, vir¬ 
tually without change, in an important 
book on intelligence published almost thirty 
years after the Harvard Educational Review 
article. 82 

A similar conclusion was echoed in 1994 
by Herrnstein and Murray, in a book that 
was received as contentiously as Jensen’s 
earlier conclusions had been. 83 

The school is not a promising place to try 
to raise intelligence or to reduce intellec¬ 
tual differences, given the constraints on 
school budgets and the state of educational 
science. 

Herrnstein and Murray, 1995, p . 414 

Quite a different view has been expressed 
by Edward Zigler, a Yale Professor who, 
as a government official, became famous as 
the “Father of Head Start.” Zigler observed 
that a variety of behaviors are improved by 
Head Start programs. These include good 
study habits, cooperative work skills, and 
improved parental support. Improving these 
characteristics was, according to Zigler, at 

82 Jensen, 1998. 

83 Herrnstein & Murray, 1994. 


least as important as improving the cogni¬ 
tive skills measured by test scores. 84 

The debate over Head Start is more 
nuanced than it appears when people sim¬ 
ply recite mantras that Head Start programs 
do (or don’t] work. Critics like Jensen and 
Herrnstein and Murray generally focus on 
IQ and similar tests as essentially complete 
measures of intelligence, while the support¬ 
ers have a more expanded view of intelli¬ 
gence, to include things like knowing how 
to manage time and how to learn arbitrary 
material - “associative learning," in Jensen’s 
terms. The supporters also stress improve¬ 
ment in the child’s general social situation, 
as Zigler's comment about parenting skills 
indicates. Such concerns, which strike me 
as being quite legitimate criteria by which 
to evaluate a pre-school program, are not 
considered in the criticisms raised by Jensen 
and Herrnstein and Murray. 

There is another qualification. Jensen and 
Herrnstein and Murray did not say that pre¬ 
school programs will not work; they said 
that those programs that were economically 
realistic did not work. That is an important 
distinction. “What early childhood programs 
improve intelligence?” is a question for edu¬ 
cational psychologists. “What could work at 
a cost of x dollars per child?” is an important 
question for educational policy makers. The 
two viewpoints are not the same. In his book 
on the improvement of intelligence, Nisbett 
concluded that 

Several early childhood education pro¬ 
grams actually do produce large immediate 
gains in IQ, as well as long term gains in 
IQ or academic achievement, or both. 

Nisbett, 2009, p. 120 

Two meta-analytic reviews 85 show that 
the picture is not as bleak as Jensen painted 
it, but that Nisbett’s statement may be a bit 
optimistic. 

There is a great deal of variation among 
pre-school intervention programs. Expendi¬ 
tures cover an eight-to-one ratio. At the low 
end we have minimal head-start programs, 

84 Zigler & Styfco, 1997. 

85 Barnett, 1998; Gorey, 2001. 
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in which children attend reasonably well- 
run pre-school programs with educational 
components from two to five times a week 
for one or two years. At the high end some 
intense programs have included staff/client 
ratios as high as one well-trained teacher 
for every three students, parental counsel¬ 
ing, and interventions for five years or more. 
In general, the meta-analyses show that 
the less intense programs have short-term 
results, but that these results (in terms of 
IQ scores and school accomplishment) fade 
away quickly. By contrast, the more intense 
programs have results that last for years, in 
terms of both intelligence test scores and, 
more importantly, school achievement. One 
meta-analysis reports an improvement of 
nine IQ points for the intensive programs, 
five years after the programs ended. This 
would be somewhere toward the end of ele¬ 
mentary school for most participants. Only 
a few studies have followed students as far 
as high school, although two very intense 
intervention studies traced participants into 
adulthood. These studies report substantial 
positive effects on social behaviors, such as 
encounters with the law, and smaller effects 
on IQ scores more than ten years after pre¬ 
school participation. 

Possibly the most intensive program, the 
ABCDerian project, was aimed at a deeply 
impoverished group of students in North 
Carolina. This study has reported positive 
results, compared to a control group, at age 
twenty-one. 86 Because this is generally con¬ 
ceded to be one of the most effective (and 
expensive) of the pre-school programs, it is 
worth looking at the study in more detail. 

The participants were children from low 
SES families in North Carolina. Virtually 
all the children were African American. 
Slightly over one hundred children were 
assigned to the special program or to a 
control group. The intervention began at 
an average age of 4.4 months and contin¬ 
ued until the children entered kindergarten. 
Depending upon the period, there was one 
instructor for every three to six children. 
In addition to instruction and supervision, 

86 Campbell et al., 2001. 


a nutritional program was developed and 
offered to both the experimental and con¬ 
trol group. The pre-school lasted virtually 
the entire day, five days a week. Participants 
were followed until they were twenty-one. 

When the program ended, at age five, the 
children took the Wechsler Preschool Intel¬ 
ligence test. The experimental group had 
a mean IQ of 100 and the control group 
a mean IQ of 94. (Both scores are some¬ 
what above what would be predicted from 
demographic data.) Figure 9.6 shows the 
results of subsequent testing. There are two 
striking features of the graph. Both groups 
showed a steady decline in test scores over 
the years after leaving the project. At the 
same time, the experimental group main¬ 
tained a roughly five IQ points (d = .3) 
advantage over the control group through¬ 
out. How could this come about? 

The decline is not surprising. IQ tests for 
children below the age of five are at best 
moderately predictive of scores at a later 
age. This could be because the tests for 
very young children are tests of cognitive 
functions different from those evaluated by 
later tests, or it could be because the cogni¬ 
tive systems important for test performance 
(e.g., working memory) are only slightly 
developed in young children. (Recall that 
heritability coefficients are lowest for very 
young children, suggesting that maturing of 
the cognitive system is a serious possibility.) 
Several studies, to be discussed in Chapter 
11, have shown that African American chil¬ 
dren obtain test scores in the 100 range 
(equivalent to Whites) in the early school 
years. African American scores decline to 
a mean of roughly 85 by adulthood. The 
control group appears to be doing this. The 
experimental group also showed a regression 
effect, but maintained its advantage over the 
control group into adulthood. 

The picture for academic achievement, 
which is of more social importantce than 
doing well on a test, is more encouraging. 
One meta-analysis 87 determined that 80% of 
the children who participated in intensive 
programs had a higher level of achievement 

87 Barnett, 1998. 
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•- ABCDerian 
—S- Control 


Figure 9.6. Mean IQ scores obtained by the participants in the 
ABCDerian project and in a randomly chosen control group. The 
intervention ended when the children entered school, at 
approximately age six. Data from Campbell et ah, 2001, Table 1. 


than that of the median participant in the 
control group, five years or more after the 
intervention ended. Participants also experi¬ 
enced fewer social and behavioral problems. 
Entry to college was also higher in partici¬ 
pants of these programs, compared to con¬ 
trol groups. 

So what does the scoreboard say? State¬ 
ments like “Head Start is a (failure) (suc¬ 
cess)” mask agreement over the facts, and 
disgreement over how to describe them. If 
you believe that the role of early child¬ 
hood intervention and special education is 
to improve intelligence, as measured by 
test scores, the critics are right that at 
best marginal improvements can be made. 8 ” 
However, the programs do improve a num¬ 
ber of academically relevant behaviors, rang¬ 
ing from study skills to improvement in 
interacting with students and teachers. Such 
behavior leads to improved learning in for¬ 
mal educational settings. These are not neg¬ 
ligible benefits. 

Scaling up a project like the ABCDerian 
project to cover a substantial portion of low 
SES children would require a huge financial 
investment and pose a tremendous problem 
of staffing. One can argue over the relative 


values of the costs and benefits, and how 
they should be assessed. These are educa¬ 
tional policy issues, not issues in the study 
of intelligence. 

9 . 4 . 5 . The Home Environment: 
Competition for Resources 

The behavior genetic studies reviewed in 
Chapter 8 have consistently found that 
‘nonshared environmental influences/ which 
refers to environmental influences that dif¬ 
fer between family members, are a sub¬ 
stantial source of environmental effects on 
intelligence/ 9 As many autobiographical 
accounts make clear, one of the biggest 
differences in a child's early life is sim¬ 
ply whether he or she is raised in a large 
family. The extreme, I suppose, would be 
a medieval royal family, where early-born 
children, as potential heirs to the throne, 
received very different treatment than did 
later-born children. As anyone who has 
either raised or been raised in a large fam¬ 
ily will attest, having several brothers and 
sisters is not like being an only child. 

Children raised in large families tend, on 
the average, to have lower test scores than 


88 Detterman & Thompson, 1997; Herrnstein & 
Murray, 1994; Jensen, 1998. 


89 Jensen, 1997. 
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Panel 9.9. The National 
Longitudinal Studies 

The National Longitudinal Studies are 
studies carried out by the US Depart¬ 
ment of Labor in order to obtain an accu¬ 
rate picture of the demographics, social 
structure, and economics of the United 
States. Such information is needed by 
policy makers. The surveys also provide 
valuable sources of data for economists, 
sociologists, educational researchers, and 
increasingly for psychologists. In general, 
the surveys utilize a random sample of 
US citizens relevant to the purpose of the 
survey. On occasion certain subgroups of 
interest are intentionally oversampled, in 
order to provide more accurate informa¬ 
tion about them. 

The National Longitudinal Study of 
Youth 1979 (NLSY79) was a survey of 
over 10,000 young men and women, aged 
fourteen to twenty-two when the sur¬ 
vey was initiated in 1979. This sample 
has been followed up periodically. Infor¬ 
mation on health, economics, and social 
status has been obtained. At the time 
of the first survey a fortuitous event 
occurred. The Department of Defense 
needed to update its normative sample 
of the Armed Services Vocational Apti¬ 


tude Battery (ASVAB) and the associ¬ 
ated Armed Forces Qualification Test 
(AFQT). Accordingly, a large percent¬ 
age of the NLSY79 participants took 
the ASVAB. The result was an impor¬ 
tant prospective study of intelligence, 
for the ASVAB scores of the fourteen- 
to twenty-two-year-old participants can 
be related to their subsequent progress 
through life. Herrnstein and Murray 
made extensive use of the NLSY79 
database in their provocative 1994 book. 

The NLSY has "spawned” (in a some¬ 
what literal sense] a second survey, the 
NLSY Children and Young Adults study, 
in which data is gathered on children 
of the NLSY79 participants. This pro¬ 
vides social scientists with valuable data 
on changes in the population over gener¬ 
ations. 

The NLSY97 survey is something of 
a repeat of the 1979 survey. Over 9,000 
young men and women born in the 1980- 
84 period were enrolled, and are inter¬ 
viewed on an annual basis. The pur¬ 
pose is to track social issues concerned 
with the transition from youth to adult¬ 
hood. Comparisons to the NLSY79 data, 
when possible, provide a way of compar¬ 
ing cohorts as they transit from youth to 
adult life. 


children raised in small families. This trend 
is shown in Figure 9.7, which presents data 
from the National Longitudinal Study of 
Youth 1979 study (described in panel 9.9]. 
Similar findings have been obtained in many 
other studies. There is a correlation some¬ 
where between —.15 and —.20 between 
family size and children’s intelligence test 
scores. 90 

While the fact is clear, the reason for 
the fact is definitely not clear. Most debates 
stress either genetic or environmental influ¬ 
ences. Galton worried that because large 
families seemed to produce less intelligent 

90 Anastasi, 1956; Herrnstein & Murray, 1994; Lynn, 

1998; Lynn & Harvey, 2008; Lynn & Van Court, 2004. 


people the intelligence of the society, as a 
whole, must fall. Essentially the same con¬ 
cern over a "dysgenic effect” was voiced 
by Cattell in the mid twentieth century, 91 
and by Richard Lynn more than fifty years 
later. 92 A key point in their argument is 
that family size increases as maternal intel¬ 
ligence drops, as is shown in Figure 9.8. The 
data shown are typical of numerous similar 
findings. 

Those who worry about dysgenics, from 
Galton to Lynn, appear to assume that 
graphs such as these represent a proximal 
genetic effect; unintelligent mothers have 

91 Cattell, 1940. 

92 Lynn, 1998. 
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Figure 9.7. Mean intelligence estimates of young children born to 
mothers who were in the NLSY79 survey. The estimate is based on 
the Peabody Test, which is suitable for children in the pre-school 
and elementary school age range. Based on Rodgers et ah, 2001, 
Table 4. 


children who are genetically unintelligent, 
and have lots of them. Before we accept 
this argument a third phenomenon has to 
be considered, the birth-order effect. 

Intelligence test scores are related to 
birth order; firstborns, on the average, have 
higher scores than second-borns, and so on. 
Figure 9.9 shows the results of a study 
from Norway, in which birth-order effects 
were shown in the intelligence test scores 
obtained by Norwegian young men when 
they registered for military service. 93 The 
birth-order effect is clearly social. If an older 
sibling dies, the next sibling in line assumes 
the “benefits” of the older sibling’s place. 
Evidently acquiring intelligence is a bit like 
the medieval rules for acquiring a kingship: 
oldest surviving child gets the crown and the 
IQ points. 

Birth-order effects are not apparent in the 
NLSY data set (Figure 9.7). This is not a 
contradiction of the evidence for birth-order 
effects, for the children of NLSY partici¬ 
pants were tested before they were ten. If 

93 Bjerkedal et al., 2007; Kristensen & Bjerkedal, 2007. 


birth-order effects are based on a competi¬ 
tion for resources, the effect may accumu¬ 
late over time, rather than being apparent 
early in the child’s life. At the same time, the 
birth-order effect cannot be the sole cause 
of the family-size effect, for, as Figure 9.7 
shows, the family-size effect can be obtained 
in a situation where there is no birth-order 
effect. 

Robert Zajonc, a professor at the Uni¬ 
versity of Michigan, and his colleagues have 
developed a model of family dynamics that 
they refer to as the confluence model. 9 * 
Zajonc argues that a child’s intelligence will 
be stimulated by an intellectually challeng¬ 
ing environment, and that the mean age of 
the family is an indicator of the intellec¬ 
tual challenge in the home environment. To 
illustrate, a seven-year-old only child whose 
father and mother are both thirty will be 
in a family where the mean age of family 
members is (30 + 30 + 73/3, or 22 1/3. For a 
seven-year-old with a three-year-old sibling, 
the mean age of family members is (30 + 
30 + 7 + 33/4, or 17 V 2 . This accounts for the 

94 Zajonc, 1983; Zajonc, Markus, & Markus, 1979. 
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Maternal AFQT 


Figure 9.8. The relation between number of children and maternal 
intelligence score (AFQT percentile score) in the NLSY79 data set. 
Source: Rodgers et al., 2001, Table 6. 


family effect; large families will have lower 
mean ages of the members. 

The general tone of Zajone’s argument, 
that large families do not provide good envi¬ 
ronments for the development of intelli¬ 
gence, has been widely accepted. It has even 
led to 'popular psychology” advice against 
having large families, on the grounds that 
the social environment of a large family will 
work against the development of children’s 
intelligence. 95 

The facts are clear; the explanation is not. 
Here are three alternatives. 

1. A proximal dysgenic hypothesis: Moth¬ 
ers’ test scores indicate their genetic 
potential (probably in part correct), and 
they pass on this potential to their off¬ 
spring (certainly true). The offsprings’ 
test scores reflect the offsprings’ genetic 
potential (probably in part correct). 
Genetic potential acts as a proximal 
influence on intelligence, thus produc¬ 
ing the family-size effect. 

2. A distal genetic hypothesis: Low mater¬ 
nal intelligence results in poorer parent¬ 
ing practices, including a tendency to 
begin child bearing at an earlier age. 
This practice may itself be partially 
due to genetic influences. The environ¬ 
ment in large families tends to restrict 
the development of intelligence. Genet- 

95 See, e.g., Brothers, 1981. 


ics acts as a distal variable, leading to 
environmental practices that, as prox¬ 
imal variables, directly influence the 
intelligence of offspring. 

3. An environmental explanation: The 
following argument is based on Nis- 
bett’s stress that culture matters in the 
development of intelligence. 96 Mater¬ 
nal intelligence test scores are negatively 
correlated with SES, and are also cor¬ 
related with membership in low-status 
racial and ethnic groups. These groups 
have reduced access to resources in time 
and money that could improve parent¬ 
ing practices. In addition, there is a 
tendency for them to follow cultural 
practices that lead to childbearing at a 
younger age, and therefore to produce 
large families in which children have to 
compete for resources. 

An argument based on proximal genetic 
influences, alone, cannot be maintained 
because it cannot account for the birth-order 
effect. It also cannot account for the world¬ 
wide trend toward reduced family sizes as 
people move into urban settings. 97 

Either the distal genetic or environmen¬ 
tal explanations could account for all three 
effects: family-size, maternal intelligence, 
and birth order. As is often the case with 
social and behavioral phenomena, multiple 

96 Nisbett, 2009. 

97 Mace, 2008. 
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Figure 9.9. The birth order effect appears in families where a 
child becomes the oldest due to the death of an elder sibling. 
Scores have been adjusted to allow for parental education 
level, maternal age at birth, birth weight, and cohort effects. 
Bars show 95% confidence intervals. From Kristensen, P., & 
Bjerkedal, T. (2007) Explaining the relation between birth 
order and intelligence. Science, 316 (5832], 1717, Figure 1. 
Reprinted with permission from AAAS. 
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explanations are possible, and they are often 
not mutually exclusive. 

Family-size effects, along with the 
cohort effect and the repeated findings 
of large heritability coefficients, represent 
well-documented phenomena with unclear 
explanations. Psychological research has 
established what happens, but not why it 
happens. 

9.4.6. Summary and a Value Judgment 

Family environments clearly do exert an 
effect on children’s development of intel¬ 
ligence. Within the range of normal fam¬ 
ily environments in the developed nations, 
roughly within the top two-thirds of 
the socioeconomic spectrum, these effects 
appear to be modest and ephemeral. They 
virtually disappear in adulthood. The sit¬ 
uation is quite different when we look at 
extreme environments, such as the homes 
of the children at risk for lowered cognitive 
development - for example, those who par¬ 
ticipated in the ABCDerian project. 


Unfortunately, though, this does not get 
us very far, for SES is a composite, abstract 
variable that covaries with many other 
variables. It is difficult to disentangle the 
effects of family social practices - such as 
failure to encourage children's problem solv¬ 
ing, authoritarian styles of adult-child inter¬ 
action, and other practices considered bad 
parenting - from concomitant deficiencies in 
the physical environment and genetic inher¬ 
itance. We cannot unambiguously assign 
damaging SES effects either to heredity, or 
to social or physical variables. Just as there 
may be many genes affecting intelligence, 
each one individually making a small contri¬ 
bution, there may be many familial variables 
affecting intelligence, each one individually 
making a small contribution. 

High-cost intervention projects have pro¬ 
duced improved cognitive performance. 
Unfortunately, the effects are far less than 
most social engineers would like to see. 
Many less expensive projects, such as the 
typical Head Start program, show very lit¬ 
tle effect on test scores, or on other purely 
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cognitive behavior, beyond the early school 
years. However, there are indications that 
these programs do prepare at risk children 
for the social experience of school, and that 
this may facilitate later school development. 

The last point is very important. In indus¬ 
trial and post-industrial societies schooling 
is a major factor in determining a person’s 
contribution to the general society. In 2006 
it was estimated that in the United States 
a person without a high school diploma 
earned only two-thirds as much a someone 
with one. 98 Other social problems, such as 
poor health and criminal convictions, also 
are negatively correlated with the possession 
of a diploma. Any program that increases 
the likelihood that children in at risk situ¬ 
ations will complete schooling is a socially 
valuable program. It has also been estimated 
that even the more expensive programs are 
cost-effective, when the total cost to society 
over a long period of time is considered. 99 I 
must admit, though, that these projections 
are based upon a number of assumptions 
that I, for one, find suspect. 

I close this section with a comment that is 
avowedly, and openly, a statement of social 
beliefs. 

I once heard an address by Sandra Scarr, 
herself a major contributor to the litera¬ 
ture on genetic and environmental effects 
on intelligence, and by no means a person 
who denies the importance of genetics. I 
regret that I have no reference except my 
memory. Scarr pointed out that a child born 
to a poor family, with limited resources, 
is often in a bleak environment. It is a 
good thing, in itself, to provide programs 
offering such children better nutrition, a 
safe place to interact with adults and other 
children, and an interesting, challenging 
environment. If the resulting program also 
improves their cognitive abilities, their intel¬ 
ligence in the important, conceptual sense, 
then that is a very nice benefit of having done 
a good thing. If the program just improves 
test scores, but does not improve cognitive 

98 Retrieved from centerforpubliceducation.org, June 

2008. 

99 Barnett, 1998. 


abilities in the more general sense, then that 
is just a curiosity. Whatever the outcome, 
society should provide as good a social envi¬ 
ronment as can be arranged. You do not 
need a reason to do a good thing, you should 
just do it. 

I find Scarr's reasoning compelling. Pro¬ 
viding aid to disadvantaged children is a duty 
owed to the children. Costs and benefits are 
relevant when comparing programs; there is 
no sense in spending more than you have to. 
However, I believe that society has a duty 
to provide such programs, just as much as it 
has a duty to provide military defense and 
security for the aged. 

9.5. Education 

The cohort effect shows that changes in the 
environment can result in major changes 
in intelligence, on a population basis. Over 
the twentieth century the industrially devel¬ 
oped countries saw improvements in nutri¬ 
tion, better health practices, smaller family 
sizes, and increases in the availability of pre¬ 
school programs. There have also been huge 
changes in education. 

The US data, which mirrors that of other 
industrially developed countries, shows how 
strong the educational change has been. In 
1900 the school enrollment rate for fifteen- 
to nineteen-year-olds was 50%; it rose to 75% 
by 1940, and has remained relatively stable 
at slightly above 90% since 1990. In 1910 the 
median number of years of education for 
people twenty-five years old or older was 
8.1 years. In 1940, at the outset of World 
War II, it was only 8.6. In 2008 better than 
85% of adults had twelve or more years of 
education. 100 In the first half of the twentieth 
century the big change in education was that 
more people gained basic skills in the tra¬ 
ditional “reading, writing, and ‘rithmetic”’ 
than had been the case earlier. Illiteracy, 
arguably the most important single result of 
a lack of formal schooling, had fallen from 
10.7% in 1900 to 2.9% in 1940. By 2000 the 

100 U.S. Census Bureau, 2010 Statistical Abstract, 

Table 224. 
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illiteracy rate was less than 1%. The second 
half of the twentieth century was character¬ 
ized by marked increases in secondary and 
tertiary education, especially for women and 
minority group members. By 2007, 29% of 
the adults in the US had a college degree. 
Similar trends have been observed in other 
industrialized countries. It is not unreason¬ 
able to assume that all this education had 
an effect on cognitive skills, and hence, on 
intelligence. But just what? 

Once again we have to be clear about 
our definitions. If we take the attitude that 
intelligence is essentially g, as indexed by 
a g-loaded nonverbal test, such as a pro¬ 
gressive matrix test, education may not 
increase intelligence that much. Alterna¬ 
tively, if we accept the fluid intelligence- 
crystallized intelligence (Gf-Gc) distinction, 
we may find different effects of education 
on Gf and Gc. I will generally take this 
approach, as I believe that it offers a use¬ 
ful perspective for dealing with the effects 
of schooling. I will keep coming back to 
my point that the important components of 
intelligence are those skills that are useful in 
society. Test scores are of interest only to the 
extent that they indicate which examinees 
possess these skills. 

We also have to be careful about what we 
mean when we talk about education. Do we 
mean any form of training? Do we include 
commercial programs that are intended to 
increase one’s intelligence (many are adver¬ 
tised, few have been evaluated), or do we 
restrict our interest to programs of formal 
education, the familiar K-12 and college/ 
university systems? I will consider two issues: 
the influence of the formal system and the 
(very few) attempts that have been made 
to teach general reasoning skills and that 
have been formally evaluated. And where 
do commercial programs to increase scores 
on tests like the SAT fall on this spectrum? 

9 . 5 . 1 . What the Educational System 
Tries to Do 

The developed countries hand over a great 
deal of the task of educating the young to 
formal school systems. This is a marked con¬ 
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trast to the situation less than 300 years 
ago, when most children learned what they 
needed to know by some form of appren¬ 
ticeship, augmented by a very small amount 
of formal training. Formal schools did exist 
in both pre-literate and ancient societies. 
The seafaring Polynesians had schools to 
teach navigators to guide canoes on ocean 
voyages of up to a thousand miles. As 
part of their training they learned to use 
sophisticated star charts. 101 In ancient Egypt 
scribes went to school to learn account¬ 
ing, surveying, construction, mathematics, 
and engineering. 102 What is different about 
our society is the near-universality of formal 
schooling, not the concept itself. 

Today’s K-12 school systems are responsi¬ 
ble for transmitting three classes of knowl¬ 
edge: the basic beliefs and traditions of the 
society, cognitive and motor skills useful 
in solving frequently encountered problems, 
and specific pieces of knowledge that will be 
required in utilizing these skills. 103 To illus¬ 
trate, somewhat flippantly, for the United 
States circa 2010, this means, “God Bless 
America, learn to read, and a camel’s a 
mammal.” 

Upwards of 50% of the citizens in 
the industrially developed nations receive 
some form of post-secondary education. 
While there are some discussions of 
the importance of a liberal education, 
most post-secondary instruction is oriented 
toward the training of specialists in fields 
ranging from welding to the law. The result 
has been a tremendous extension of the time 
between the end of childhood and the point 
at which a fully trained adult enters soci¬ 
ety. Alexander the Great was twenty-two 
when he led the Macedonian/Greek army 
into Asia. Napoleon was appointed brigadier 
general at twenty-four. In today’s military 
they would both be lieutenants. 

Richard Snow, an educational psychol¬ 
ogist at Stanford University, has argued 
that schools are successful if they provide 

101 Hutchins, 1983. 

102 Information provided by the King Tutankhamen 

exhibit at the Field Museum, Chicago, 2006. 

103 Cole, 2005. 
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students with useful aptitudes, by which he 
meant skills that can be used to operate 
in the larger society and to guide further 
learning . 1Q 4 Snow divided the cognitive apti¬ 
tudes into two classes: specific skills (e.g., 
reading, knowledge of history) used to solve 
certain classes of problems, and general rea¬ 
soning skills used both to solve problems and 
to guide further learning. Snow's categoriza¬ 
tion of aptitudes mirrors the Cattell-Horn 
distinction between Gf and Gc, but was set 
in a broader context than that of the testing 
paradigm. 

Snow pointed out that schools also taught 
skills in self-management and cooperative 
problem solving. While some may not wish 
to include such skills in the definition of 
intelligence, they are certainly relevant to 
the application of one's intelligence to prob¬ 
lems outside of a school setting. These 
skills are often taught implicitly, by proce¬ 
dures that Snow referred to as a metacur¬ 
riculum, encouraging independent inquiry 
and cooperative studies. This is certainly 
the approved practice in education today, 
although it is not always what actually hap¬ 
pens. It represents a departure from the 
authoritarian, didactic methods that were 
common in all schools up to about 1950, and 
that are not unknown today. 105 

9 . 5 . 2 . The Evidence for Educational 
Influences on Intelligence 

The American President Theodore 
(“Teddy”) Roosevelt is reputed to have 
said, 

A man who has never gone to school may 
steal from a freight car; but if he has a uni¬ 
versity education, he may steal the whole 
railroad. 

Attributed to Roosevelt by Laurence 
Peters (1977, p . 117) 

Can you imagine a more ringing affirmation 
of the value of education? 

I suspect that Roosevelt would have 
regarded the question “Does schooling 

104 Snow, 1996. 

105 Bransford, Brown, & Cocking, 1999. 


increase intelligence” as trivial, because it is 
obvious that educated people are generally 
capable of solving problems that uneducated 
people cannot. Many of these problems are 
socially relevant. They range from reading 
and understanding newspapers to balancing 
checkbooks and comprehending the terms 
of mortgages. A substantial part of a per¬ 
son's intelligence, in the conceptual sense of 
being able to solve socially relevant prob¬ 
lems, is clearly the product of education. 

What about the effects of education upon 
intelligence in the much narrower sense of 
improving test scores? Today that is a fairly 
hard question to answer, because virtually 
everyone goes to school at least up to the 
point at which education shifts toward spe¬ 
cialty training. However, this was not always 
true, so prior to World War II it was possi¬ 
ble to conduct a study contrasting groups 
of children, of similar socioeconomic back¬ 
ground, who either did or did not have 
access to formal schooling. This situation 
often occurred for reasons that had nothing 
to do with the personal traits of the students, 
such as the decision to locate a road, and 
with it a school, near one community and 
distant from another. The children who had 
access to schools had higher intelligence test 
scores than those who did not. The differ¬ 
ence in test scores between groups increased 
with increases in the difference in the avail¬ 
able schooling. 106 

In today's society there is a positive rela¬ 
tion between a person’s intelligence test 
score and his or her level of education. High 
school drop-outs have generally lower intel¬ 
ligence test scores than those who complete 
high school. The effect interacts with SES; 
low intelligence is more predictive of drop¬ 
ping out for students from low SES fami¬ 
lies than it is for students from moderate or 
high SES families. 107 But what is cause and 
what is effect? The correlation could come 
about because intelligence produces success 
in school, or because increasing amounts of 
schooling produce intelligence, or both. 

106 Ceci, 1990; Bronfenbrenner et al., 1996; Ceci & 

Williams, 1997. 

10 7 Herrnstein & Murray, 1994, Chapter 6. 
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Figure 9.10. The path diagram for a study 
showing the influence of education on 
intelligence. It is necessary to show that 
education has an influence on intelligence at 
time 2 (the ? in the figure) in addition to the 
influence exerted by intelligence at time 1. 

One way to disentangle the situation is 
to look at cases in which nonschool intelli¬ 
gence test measures are taken before some 
variable period of schooling, followed by a 
second intelligence test. The path diagram 
for this design is shown in Figure 9.10. The 
question is whether scores on the second 
intelligence test are influenced by schooling, 
after accounting for the relation between the 
first and second tests. Herrnstein and Murray 
analyzed scores for people in the NSLY79 
for whom first test scores were available 
(see panel 9.9) and concluded that schooling 
had little, if any, effect. This result has been 
widely quoted. However, a detailed reanaly¬ 
sis of the same data by Christopher Winship 
(Harvard University) and Sandor Korenman 
(City University of New York) 108 questioned 
their conclusion. Winship and Korenman 
also reviewed a number of other studies 
using some variant of the design sketched 
in Figure 9.10. A Norwegian study was par¬ 
ticularly important, because it provided an 
unusually good natural experiment. 

Norwegian schoolchildren are given intel¬ 
ligence tests at age thirteen. At age eighteen 
all Norwegian men take an intelligence test 
as part of their registration for military ser¬ 
vice. The eighteen-year-old population con¬ 
tains some people who are still students, 
and others who have dropped out at various 
periods following completion of compulsory 

108 Winship & Korenman, 1997. 


education. Those who continued as students 
tended to have higher test scores at age eigh¬ 
teen than did the registrants who dropped 
out of the educational system. This fact, 
alone, is not compelling evidence, because it 
could be because the more intelligent people 
stayed in school longer. However, the peo¬ 
ple who stayed in school had higher scores 
at age eighteen than would have been pre¬ 
dicted from their scores at age thirteen, indi¬ 
cating a beneficial effect of schooling. 

On the basis of their own analyses and 
the studies that they reviewed, Winship 
and Korenman estimated that an additional 
year of education (through the K-12 years) 
adds approximately 2.7 IQ points to a per¬ 
son’s test score. Speculating a bit, suppose 
we combine this estimate with data on the 
increase in educational attainment in the 
United States. In 1946 the median of edu¬ 
cational attainment for adults was 8.6 years. 
Today the median is “some college,” which 
I arbitrarily (and conservatively) set at 
13.1 years. Most of this increase has occurred 
since 1940. Taking the Winship and Koren¬ 
man figures at face value, this implies that 
a 4.5 x 2.7 = 12.15 IQ P°i n t rise in test 
scores over the last half of the twentieth cen¬ 
tury could be accounted for by increases in 
education. 109 

What does education do to raise intelli¬ 
gence? We need to distinguish between apti¬ 
tudes that are taught explicitly and those 
that are taught implicitly. 

Explicitly, educational systems try to pro¬ 
vide students with the knowledge and cog¬ 
nitive skills required to operate in their soci¬ 
ety. This is very close to the definition of 
Gc. Not surprisingly, all battery-type intel¬ 
ligence tests include tests that assess such 
knowledge and skills, either directly or indi¬ 
rectly. This is certainly true of the Armed 
Forces Qualifying Test (AFQT) described 
in Chapter 3 of this book, and treated by 
some researchers (including Herrnstein and 
Murray) as virtually synonymous with g. 

109 Nancy Robinson, a developmental psychologist at 
the University of Washington, has pointed out to 
me that this argument does not explain why scores 
on young children's intelligence tests have also risen. 
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The AFQT is based on the Armed Services 
Vocational Aptitude Battery (ASVAB] sub¬ 
tests of word knowledge, arithmetic reason¬ 
ing, paragraph comprehension, and math¬ 
ematics knowledge - all subjects that are 
explicitly taught in school. As a result, the 
ASVAB, and of necessity the AFQT, evalu¬ 
ates Gc. uo 

Schools implicitly teach three other apti¬ 
tudes that are important in modern society: 
problem solving removed from the imme¬ 
diate context of life, abstract conceptual 
thinking, and (more recently) the ability to 
detect patterns in stimuli. There is no claim 
that the teaching is perfect, or that schooling 
is the only way to develop reasoning skills. 
There are many illustrations showing that 
relatively unschooled people can exhibit 
sophisticated reasoning in the context of 
everyday problems. 111 These cautions do not 
detract from the main point. All that needs 
be maintained is that schooling increases the 
likelihood that children will acquire power¬ 
ful, abstract problem-solving methods that 
can be applied to a variety of problems. 

Literacy is probably the most power¬ 
ful of these skills. Studies contrasting lit¬ 
erate and nonliterate societies have shown 
that the possession of literacy increases will¬ 
ingness to think about hypothetical situa¬ 
tions, and that literacy increases the ten¬ 
dency to use abstract classification systems 
based on object features, rather than clas¬ 
sification based upon a concrete situation. 
Classification-based reasoning tasks are part 
of many intelligence tests. In addition, 
modern educational methods have stressed 
the importance of evaluating evidence by 
detecting patterns. This is particularly the 
case for instruction in science and mathe¬ 
matics. Spearman argued that this ability - 
“eduction,” in his terms - is one of the most 
important components of intelligence. 112 An 
analysis of the techniques used in modern 
elementary school mathematics classes has 
shown not only that the schools teach "educ- 

110 Roberts et al., 2000. 

111 See Cole, 2005; Hunt & Minstrell, 1994, 1996; and 

Lave, 1988, for some examples. 

112 Spearman, 1923. 



Figure 9.11. A problem similar to the problems 
presented in K-2 mathematics textbooks. The 
task is to complete the sequence of figures. 

Based on examples provided in Blair et al., 2005. 

tion,” but also that some of the exercises that 
they use to do so closely resemble the sorts 
of items that appear on intelligence tests of 
inductive ability . ll 5 An example is shown in 
Figure 9.11. 

The relation between intelligence and 
schooling is interactive. Not everyone will 
gain exactly 2.5 IQ points from a year of 
education. There are strong feedback loops 
between intelligence and the effects of edu¬ 
cation; more intelligent people learn more, 
and often quite a bit more, from schooling 
than do less intelligent people. We go into 
this in some detail in Chapter 10. Neverthe¬ 
less, the overall message is clear. Education 
does increase intelligence, both in the nar¬ 
row sense of increasing IQ scores and in the 
far more important sense of increasing one's 
ability to solve life’s problems. 

Was Roosevelt’s praise of university edu¬ 
cation justified? Although some university 
officials claim that the purpose of a (lib¬ 
eral) college education is to teach people 
to think, the claim has never been tested. 
This is due, in no small part, to university 
educators’ failing to agree on just what the 
empirical definition of “being able to think” 
ought to be. Of course, we do know that 
colleges and universities do an excellent job 
of training high-level specialists such as engi¬ 
neers and physicians. And for a direct test of 
Roosevelt’s claim? 

In the first six months of 2009 two sepa¬ 
rate financial scandals were revealed. Quite 
independently of each other, Bernard Mad- 
off and Allen Stanford, both university grad¬ 
uates, had bilked investors of tens of billions 
of dollars in sophisticated frauds. This makes 
Robin Hood look like a penny ante thief, and 
makes Roosevelt look prescient. 

Blair et al., 2005. 
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9.6. Training Intelligence 

In this section we move from considering the 
effects of schooling to the effects of briefer 
training programs that are supposed to influ¬ 
ence general reasoning skills. It is important 
to distinguish these programs from programs 
that are essentially adjuncts to the formal 
education program, in which they teach spe¬ 
cific topics, for example, reading or mathe¬ 
matics. Some of the supplementary teach¬ 
ing programs are good, some are not, and 
there is no point in discussing them. Other 
commercial programs are marketed as ways 
to improve your thinking, which is a rather 
nebulous claim. I do not know of a single 
one of these programs that has ever offered 
acceptable scientific evidence of its effec¬ 
tiveness. The programs are marketed on the 
basis of endorsements by users, rather than 
by comparisons between experimental and 
control groups. A reviewer with scientific 
training will want to scream uncontrolled 
placebo effects 1 . 

There is a lively market in coaching pro¬ 
grams intended to improve students' scores 
on socially important tests, such as a college 
entrance test. The SAT is a favored target. 
These programs do work. It is worth looking 
at their claim to have improved intelligence. 

In principle, there is another way 
to improve intelligence through training. 
While we certainly do not know all about 
the basic cognitive processes that under¬ 
lie intelligence, we do know a good deal 
(cf. Chapters 6 and 7). Could intelligence 
be improved by training basic information¬ 
processing functions, by analogy to the way 
in which athletic performance is improved 
by exercise? There has been some progress 
along this path. 

9.6.1. Coaching Programs that Raise 
Test Scores 

Suppose that I offered to sell you a training/ 
coaching program that would improve your 
score on test X, where test X is any one of 
the commonly used tests related to intelli¬ 
gence, ranging from the SAT to a progressive 
matrix test. Should you buy it? 


The question is not fanciful. Several 
coaching programs that improve scores on 
college entrance and similar examinations 
are available. These programs do increase 
test scores. But do they increase intelligence? 

The distinction between crystallized and 
fluid intelligence is relevant. Gc, in its gener¬ 
alized sense of culturally useful knowledge, 
rather than in the more limited sense of 
questions about knowledge that find their 
way into an intelligence test, is certainly 
part of intelligence. Note that the gen¬ 
eralized definition includes most of what 
Robert Sternberg has referred to as “practical 
intelligence.” 114 Of course, such knowledge 
can be acquired by coaching, which then 
becomes an extension of formal education. 
If you are going to have to take an examina¬ 
tion in Spanish, it makes sense to be trained 
in Spanish, and as you receive this train¬ 
ing you will probably pick up some nonlin- 
guistic but useful information about Spanish 
culture and society. Gc will be increased. 
It is not surprising that a coaching pro¬ 
gram would work, and if the price is right, 
buy it. 

A coaching program might also increase 
scores by improving skills that are useful in 
test taking, but not useful for much else. An 
example is the strategy of improving your 
chances on a multiple choice test by rul¬ 
ing out obviously wrong alternative answers, 
and then guessing. It might be rational to 
purchase a coaching program that taught 
test-specific skills if passing the test were a 
goal in itself, but not if you were interested 
in improving the socially relevant mental 
abilities that the test measured. 

Two psychologists at the Educational 
Testing Service, Samuel Messick and Ann 
Jungeblut, have shown that these distinc¬ 
tions apply to the coaching programs mar¬ 
keted as ways to prepare for the SAT. 11 * 
These programs vary from a two- or three- 
hour session on “how to take a test” to inten¬ 
sive tutoring over a period of several months. 
Depending on the program, documented 
gains on the verbal and mathematical 

114 Sternberg, 2003; Sternberg et al., 2000. 
u * Messick & Jungeblut, 1981. 
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sections of the SAT ranged from 10 or 20 
points per section (.10 in standard devia¬ 
tion units) to as much as 100 points (1 stan¬ 
dard deviation unit), which is quite a lot. 
There was a correlation of .7 between the 
length of the coaching program and the size 
of the program’s effect. Not surprisingly, 
the shorter coaching programs concentrated 
on test-taking skills, while the longer ones 
amounted to education, which translates 
into a real gain in Gc. 

Many people would regard these remarks 
as simple statements of the obvious. When 
people talk about improving intelligence, 
or alternative terms, such as "reasoning” or 
"problem solving,” I think they are generally 
talking about fluid intelligence - the ability 
to solve unexpected or unfamiliar problems. 
Can this be done? 

Here we bump up against a problem of 
definition. It is easy enough to see whether 
a training program has developed a person’s 
vocabulary, or his or her ability to do specific 
types of problem solving, in anything from 
carpentry to physics to the law. But how 
are we to know whether a person has been 
trained to think? A program to teach intelli¬ 
gence is successful to the extent that it raises 
some of the skills that constitute the general 
definition of intelligence. In order to mea¬ 
sure the effect of a program we measure per¬ 
formance on a test of fluid intelligence. This 
alone is not enough. We have to show that 
improved test performance has been accom¬ 
panied by improved performance in socially 
relevant tasks outside of the testing context. 
Otherwise we are open to the charge that 
what has been taught are test-specific skills. 
This is a real issue. Not surprisingly, if people 
take the same cognitive test twice, they get 
better scores the second time. This applies to 
tests as different as battery-type intelligence 
tests and progressive matrix tests. However, 
statistical analyses of the improved scores 
show that they are not related to the gen¬ 
eral reasoning factor (g). Test-specific skills 
have been improved by familiarity with the 
tests. 116 


116 te Nijenhuis, van Vianen, & van der Flier, 2007. 


What we would like to have is a demon¬ 
stration that going through some form of 
training program that is very different from 
a test of g results in improvements on a 
g-loaded test. I know of very few such 
efforts. The next section discusses two such 
attempts, one that succeeded and one that 
failed. I then make some speculative remarks 
about the sorts of situations that can lead to 
success. 

9.6.2. Mixed Results from School-based 
Programs that Might Improve Gf 

The good news for the training of intelli¬ 
gence comes from an unlikely place - the 
Sudan. In 2007 a group of researchers lead 
by Paul Irwing, of the University of Manch¬ 
ester, and including Sudanese colleagues, 
conducted a study on the effects of train¬ 
ing with the abacus, an ancient comput¬ 
ing device that is still used in the Mideast 
and northern Africa. The experimenters 
compared schoolchildren’s Raven Progres¬ 
sive Matrices performance, before and after 
the children had either completed a nor¬ 
mal school curriculum or had had the same 
curriculum, plus sixty-eight hours of abacus 
instruction, two hours per week for several 
months. Both groups showed improvement 
on the progressive matrix test, which is not 
surprising because it was a second adminis¬ 
tration, and also because the children were 
now almost half a year older. The key point 
is whether the experimental groups showed 
more improvement than the control. 
Figure 9.12 shows that they did. This sug¬ 
gests that the training program did influence 
general reasoning skills. 

The negative results come from my own 
laboratory. For a number of years my col¬ 
leagues at the University of Washington and 
I, in cooperation with Jim Minstrell, an 
award-winning high school science teacher, 
worked on the development of educational 
programs to improve high school science 
instruction, largely by presenting challeng¬ 
ing problems to be solved, and then empha¬ 
sizing the reasoning involved. 117 In some of 

117 Hunt & Minstrell, 1994, 1996. 
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Figure 9.12. Abacus training improves intelligence test scores. The 
difference between the experimental and control groups, in d units, 
for Raven Standard Progressive Matrices scores obtained by- 
Sudanese children, before and after the experimental group 
completed abacus training. The d values within age group are 
approximations, based on the data in Irwing et al., 2008, Table 2, on 
the assumption that there were approximately equal numbers in the 
experimental and control groups. 


our studies we had students attempt pro¬ 
gressive matrix problems before and after 
receiving up to a year’s instruction. While 
the programs were quite successful in teach¬ 
ing reasoning about topics in introductory 
physics, we did not find any consistent gains 
on progressive matrix problems. 118 

There are so many differences between 
elementary schools in the Sudan and high 
school students from affluent American 
school districts that one hesitates to draw 
any conclusions at all. There is also a puz¬ 
zling aspect to the study. The problems pre¬ 
sented to the American students stressed 
reasoning, not (for the most part) detailed 
knowledge of the minutiae of physics. Doing 
arithmetic on the abacus does not require a 
great deal of problem solving, once you have 
grasped the basic principles of the device, 
but it does require concentration and an abil¬ 
ity to hold intermediate calculations in one’s 
head. Naively, one might think that solving 
physics problems would be more like solving 
progressive matrix problems than doing aba¬ 
cus arithmetic. So why did the results come 
out the way they did? Speculation about this 
is instructive, because it raises some general 

118 Levidow, 1993. 


considerations about interpretation of find¬ 
ings in this area of research. 

The authors of the Sudanese study say 
that children in both the experimental and 
control group were following a standard cur¬ 
riculum, but provide no details about any 
differences in other education-relevant or 
environmental experiences that might have 
differed between the groups. They did not 
indicate whether experimental and control 
classes were chosen at random, something 
that is essential in evaluating educational 
research. My evaluation is that some envi¬ 
ronmental variable improved reasoning in 
the Sudanese children, and that it was prob¬ 
ably associated with abacus training, but I 
would like to have more details. 

A second reason that I speculate as being 
a cause of the discrepancy in results is that 
the participants in the study were operat¬ 
ing at different levels of general reasoning. 
The Sudanese children were operating at the 
twelfth percentile of performance, in terms 
of the British standardization of the Raven 
Progressive Matrices test. A similar statistic 
is not available for the American teenagers, 
but they were in schools that served mid¬ 
dle to high SES families, and were voluntar¬ 
ily taking high school physics, an optional 
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course that students consider to be difficult. 
It is reasonable to assume that they were in 
roughly the upper third of general reasoning 
talent. 

Recall the consistent finding that we 
know quite a few aspects of the environ¬ 
ment that harm intelligence, and only a 
few that make much improvement at the 
top. The Sudanese children were probably 
operating toward the bottom of the scale 
and, compared to the American students, 
in quite unfavorable environments. If my 
conjecture is correct, their (low) reasoning 
scores would be relatively malleable. The 
American students were operating at a much 
higher level on the reasoning scale, so the 
changes that were made to the instruction, 
although beneficial in the narrow sense of 
teaching physics (the primary purpose of the 
intervention), made little if any difference in 
terms of general reasoning capabilities. 

9.6.3. Training Information-processing 
Capacities: Processing Speed and 
Automation 

Cognitive processing speed refers to the speed 
with which the brain can accomplish simple 
decision and recognition tasks. Intelligence 
theorists have tended to treat processing 
speed as a stable trait that has a correlation 
with intelligence test scores around .30 in 
college students. 119 The correlation increases 
if we expand our studies throughout the 
adult age range, because cognitive slowing 
is a general characteristic of aging. What 
apparently has not been realized (at least by 
most intelligence researchers) is how mal¬ 
leable processing speed is. 

In the 1970s Walter Schneider (now at 
the University of Pittsburgh) and Richard 
Shiffrin (now at the University of Indiana) 
conducted a very important series of exper¬ 
iments on the influence of practice upon 
visual detection. In visual detection tasks an 
observer scans an array of stimuli to deter¬ 
mine whether or not a target stimulus is 
present. To illustrate, is the letter K (the 

119 Jensen, 2006. 


target) present in the following array? 


A R V B 
H K T P 
F W E Q 


Schneider and Shiffrin showed that under 
certain conditions training can decrease the 
time required to detect a familiar target 
by a factor of 10, from over 200 millisec¬ 
onds to as few as 20 or 30 milliseconds. 
The training requires hundreds of trials 
in which observers always search for the 
same target. 120 Further research showed that 
the phenomenon is not restricted to visual 
detection. Similar, somewhat smaller reduc¬ 
tions in detection time are found when 
observers search for exemplars of abstract 
categories, such as searching an array of 
words for an animal name. 121 

Schneider and Shiffrin presented their 
results as a demonstration of automaticity, 
the principle that if the same task is prac¬ 
ticed over and over again it becomes very 
fast, and not subject to the control of the 
relatively slow working memory-attentional 
control system. Their explanation has been 
verified by imaging studies in which par¬ 
ticipants have their brains scanned as they 
practice tasks to the point of automation. 
There is a shift in activity from the forebrain 
regions involved in the working memory- 
attentional control system to task-specific 
regions of the brain. 122 This has implications 
for higher-order cognition. 

Anders Ericsson, a professor at Florida 
State University who has conducted system¬ 
atic studies of expert behavior, has shown 
that experts in many fields practice in such 
a way as to encourage the automatization 
of those components of a complex task that 
can be automated. 12 * This allows the expert 
to concentrate his or her attention on those 
aspects of a task that require thought. To 
take an example from athletics, during a 

120 Schneider & Shiffrin, 1977. 

121 Fisk & Schneider, 1983. 

122 Hill & Schneider, 2006. 

125 Ericsson, Krampe, & Tesch-Romer, 1993. 
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match a champion tennis player will concen¬ 
trate on where to place the ball, not on how 
to hit it. That has already been drilled into 
the champion’s brain, by hours and hours of 
practice. 

We may illustrate the same thing with 
respect to tasks specifically studied in intel¬ 
ligence research. I offer an example from 
a study done in my own laboratory. Recall 
that in the g-VPR model of intelligence 124 
the ability to mentally rotate a visual percept 
“in the mind’s eye” is seen as a basic dimen¬ 
sion of intelligence. An often-repeated find¬ 
ing is that mental rotation is faster in men 
than in women, and that it deteriorates over 
the adult years. With only five days of train¬ 
ing, one hour a day, we trained middle-aged 
women to perform mental rotation tasks as 
quickly as they were performed by male 
undergraduate students when they walk into 
the laboratory. We did not destroy the age 
effect, for the undergraduates also got faster 
with practice. What we did show is that 
mental rotation, the prototypical task for 
assessing R in the g-VPR model, is not invari¬ 
ant over practice. 125 

Is automation relevant to everyday prob¬ 
lem solving? The answer to this question is 
“yes, almost always a little bit, and some¬ 
times a lot.” We can think of automation 
as an increase in the speed with which 
people can recognize that certain responses 
are dictated by the situation. Tasks can 
be automated if there is a constant map¬ 
ping between a stimulus and a response; for 
instance, you are always supposed to stop at 
a red traffic light, and 2 + 4 always makes 6. 
Very few of the activities that we call “think¬ 
ing” can be entirely automated. Almost all 
of them contain elements that can be auto¬ 
mated. Here are two examples. 

Mathematics is not the same as arith¬ 
metic, but mathematicians know their arith¬ 
metic very, very well. Alexander Aitken, a 
leading British mathematician in the mid 
twentieth century, was also a formida¬ 
ble mental calculator. 126 Schoolchildren are 

124 Johnson & Bouchard, 2005. 

125 Berg, Hertzog, & Hunt, 1982. 

126 Nickerson, 2010, pp. 159-162. 
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drilled until they know their multiplication 
table, up to 10 x 10. Several mathematicians 
have been reported to know a good part of 
the multiplication table up to 100 x 100! It is 
important, though, to distinguish between 
arithmetical calculation and doing mathe¬ 
matics. Calculation is a tool for mathemat¬ 
ics, distinct from any deep understanding of 
mathematics. It is hard to imagine a math¬ 
ematician who could not calculate. On the 
other hand, there are numerous cases of cal¬ 
culating prodigies who had little mathemat¬ 
ical talent. 

What is going on? Mathematicians learn 
many arithmetical facts to the point that 
they are readily accessible from long-term 
memory. As is the case in many other areas 
of expertise, these facts are organized into 
coherent networks of relationships, so that 
they may be easily retrieved. There are spe¬ 
cialized brain areas associated with calcu¬ 
lation and simple mathematics, arithmetic 
and number representation. In both mathe¬ 
maticians and calculating prodigies the men¬ 
tal calculation areas simply run more effi¬ 
ciently than in most of us. Mathematicians 
utilize their efficient mental calculations in 
the service of deeper understanding of math¬ 
ematics. The calculating prodigies stop at 
arithmetic. 127 

And then we have verbal comprehen¬ 
sion. Here is a fragment of a poem writ¬ 
ten by Lewis Carroll, the author of Alice in 
Wonderland. 

“The time has come/' the walrus said 
“to talk of many things: Of shoes and 
ships - and sealing wax, - of cabbages 
and kings." 

—Lewis Carroll , "The Walrus and 
the Carpenter” (1872) 

Figuring out the meaning of this passage 
is hard enough without wasting time search¬ 
ing for the meanings of walrus, shoes, ships, 
sealing wax, cabbages, and kings. Automated 
word retrieval leaves time for the hard tasks. 

An intelligent person will have reduced 
the processing time required to retrieve mil¬ 
lions of items of information that are useful 

127 Butterworth, 2006. 
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in everyday life. And how will automation 
be achieved? By practice. 

9.6.4. Training Working Memory 

The working memory-attentional control 
complex is a central part of the information¬ 
processing functions supporting intelli¬ 
gence. A demonstration that the efficiency 
of working memory can be improved is 
tantamount to a demonstration that intel¬ 
ligence can be improved. Why? 

Working memory is illustrated by tasks 
that require participants to do several things 
at once, such as simultaneously monitoring 
visual and verbal input streams, or suppress¬ 
ing information from one input stream while 
processing information from another. Such 
tasks are not demonstrations of intelligence 
in themselves. They are measures of the 
functioning of an information-processing 
system in the brain, primarily in the frontal 
and parietal cortices and in the cingulate cor¬ 
tex, that provides capacities vital for solving 
complex problems. 128 The brain has consid¬ 
erable plasticity; it can reorganize itself as 
a result of experience. Two recent studies 
have shown that practice on tasks involving 
working memory and the control of atten¬ 
tion will improve performance on intelli¬ 
gence tests. 

One was a study by a joint University of 
Michigan-University of Bern (Switzerland) 
group that emphasized the storage compo¬ 
nent of the working memory complex. 129 
The task used was an n-back task. In this 
task a stream of stimuli are presented, one 
at a time. The participant indicates when 
the current item is a repetition of the item 
presented a certain number of items back. 
To illustrate, in the stream 


XXYZZXYX 


the third X is a repetition of the stimu¬ 
lus presented four items previously. In a 
four-back task the participant would be 

128 See Chapter 5 for a discussion. 

129 Jaeggi et al., 2008. 
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Figure 9.13. A cartoon rendition of Posner and 
Rothbart's attention-training task. The task is to 
press the key showing the direction of the center 
fish. The fish on the right and left of the center 
may point in the same or the opposite direction 
of the center fish. 

asked to indicate the repetition. College stu¬ 
dents attempted a version of the task in 
which they were required to monitor two 
simultaneously presented streams of stim¬ 
uli, one presented visually and one presented 
aurally. The students practiced for from one 
to nineteen days. They also took a variety 
of fluid intelligence tests before and after 
practicing the working memory tasks. Test 
scores increased after practice on the work¬ 
ing memory tasks, and, most impressively, 
the amount of increase was directly related 
to the amount of practice. 

The second study was done by Michael 
Posner and Mary Rothbart of the University 
of Oregon. 130 Young children were trained 
on a visual attention task intended to force 
them to focus on one part of a visual 
scene while ignoring others. The task itself 
is shown in Figure 9.13. Compared to a 
control group, the children trained on the 
visual attention task showed improved per¬ 
formance on the Kauffman children's test of 
cognitive ability. 

The working memory-attention complex 
is not the only information-processing func¬ 
tion that is both relevant to intelligence 
and trainable. Processing speed can also be 
trained. The mechanism for training is sim¬ 
ple and ubiquitous in our society: action 
video games. It has been shown that ado¬ 
lescents and young adults who play action 
video games perform better on a variety 
of tests of general processing speed than 
do their contemporaries who do not play 
the games. This cannot be entirely a selec¬ 
tion effect, for it is possible to take people 

130 Posner & Rothbart, 2007. 
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who do not normally play games and have 
them practice playing; they then show supe¬ 
rior performance on measures of cogni¬ 
tive processing speed outside of the gaming 
context. 151 

Results such as these illustrate how prac¬ 
tice on tasks that themselves do not evalu¬ 
ate intelligence, in any usual sense of the 
word, can improve performance on the 
complex problems presented on intelligence 
tests. There is a close analogy to the finding 
that intelligence test performance can be 
improved by, in Posner and Rothbart's 
term, “training the brain/' and the totally 
unsurprising, but important, finding that 
weight training can improve performance in 
athletics. 

However, it is important not to overin¬ 
terpret laboratory studies like these. They 
show how an environmental effect on intel¬ 
ligence could be produced, but they do 
not show that variations in intelligence are 
produced this way in the world outside 
the laboratory. Take, for example, the cor¬ 
relation between parental SES and chil¬ 
dren's intelligence. There are differences 
in the home environments of children in 
high and low SES homes, especially in 
the amount of exploratory activity permit¬ 
ted, parental encouragement of independent 
problem solving, and amount (and type) of 
TV watching. Do these differences trans¬ 
late into differences in opportunities to exer¬ 
cise the working memory-attention control 
complex? The question could be answered 
by a painstaking cognitive task analysis of 
children's environments. To my knowledge, 
no such analysis has been conducted. 

9.7. The Challenge Hypothesis 

The Challenge Hypothesis (Chapter 1) is 
a claim that people develop their intelli¬ 
gence when they rise to meet environmental 
challenges. Genetic inheritance establishes a 
reaction range, which determines the limits 
of development. The level of intelligence a 
person achieves within these limits depends 

1?1 Dye, Green, & Bavelier, 2009. 
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upon both the environment and the way the 
person reacts to it. The following principles 
apply: 

1. The physical environment constrains 
the reaction range. The constraints gen¬ 
erally have the effect of driving intel¬ 
ligence downward, as in the case of 
alcoholism, poor nutrition, and atmo¬ 
spheric lead. Certain drugs related to 
the amphetamines can provide a tempo¬ 
rary enhancement of alertness, and can 
produce transient improvement of some 
cognitive functions, but their long-term 
effects are not known, and could well 
be deleterious. More generally, within 
the range of the physical environments 
present in post-industrial societies, ben- 
efitting from the physical environment 
is largely a matter of avoiding things that 
will make you stupid. 

2. Within a person's reaction range, real¬ 
ized intelligence is produced by inter¬ 
acting with the social environment. This 
includes the early home environment, 
the school, and in adult life, personal 
and professional experiences. 

3. The extent to which a person benefits 
from environmental challenges depends 
upon how the person engages with 
them. Sternberg 152 has identified three 
strategies for engagement. Adaption is 
accepting the situation and changing 
your own behavior to meet it. Decid¬ 
ing to study for an examination is a pro¬ 
saic example. Shaping is changing the 
situation to adjust to your abilities. This 
in itself can be a learning experience. 
Selection is finding a situation in which 
you can more easily prosper, given your 
current talents. When a student drops 
a tough class in favor of an easy one, 
he or she selects and, as the example 
shows, disengages. Each of these strate¬ 
gies makes sense in an appropriate situ¬ 
ation, but only adaption and shaping are 
likely to lead to expanded intelligence. 

Sternberg, 2003b; Sternberg, Grigorenko, & Zhang, 
2008. 
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I will now make some frankly speculative 
remarks about society and even civilizations. 

Social environments vary in the extent 
to which they encourage the development 
of intelligence, because they vary in the 
extent to which they encourage engagement 
by adaption and shaping. A society that 
endorses the belief that intelligence is fixed 
inhibits the development of intelligence, 
because it encourages people to avoid chal¬ 
lenging problems. Look at this rationally. If 
you encounter a problem that appears diffi¬ 
cult, and if you believe that your own abil¬ 
ities are fixed, then the sensible thing to do 
is to select a new environment. 

This is particularly the case if the costs 
of failure are high. And in the case of 
schoolchildren and adolescents, the cost of 
failure in front of the peer group is very 
high. A person who engages with intellec¬ 
tually challenging, and hence intelligence- 
developing, problems is going to make 
errors. The important thing is what hap¬ 
pens next. There is an apocryphal saying 
that engineers analyze an accident in order 
to prevent the next one, while lawyers ana¬ 
lyze an accident in order to find someone to 
blame. If we want people to develop intel¬ 
ligence, we have to encourage them to take 
an engineer’s attitude, not a lawyer’s. 

Having established this attitude, society 
must provide the challenges. This is easy, 
providing that a society does not become 
fixed in a belief that the way it does things, 
now, is the only way to do them. This is 
a point where we can be optimistic about 
our society. The sociologist Carmi Schooler 
has pointed out that the rise in intelligence 
test scores across generations has been paral¬ 
leled by a rise in the complexity of society . 1 * 3 
Personal finance provides a good illustration 
of Schooler’s point. It used to be that per¬ 
sonal finance was simple for all but the very 
wealthy. If you had the money to buy some¬ 
thing, you could; if you did not have the 
money, you could not. Today we have credit 
cards, adjustable rate mortgages, auto loans, 
student loans, and what have you. For most 

135 Schooler, 1998. 


of the citizenry cash is passe, and thinking 
about one's personal finances is challenging. 

There are people who have argued that the 
same could be said of our much-condemned 
video games. Some of them contain ele¬ 
ments of the techniques used to train visual- 
spatial reasoning, processing speed, and the 
working memory-attention complex ! 1 * 4 
A dynamic society challenges its mem¬ 
bers and, as it does so, redefines what it 
means to be intelligent. If a society is per¬ 
vaded by a belief that the old ways are the 
best ways (or the only ways), both the devel¬ 
opment and the definition of intelligence 
are constrained. Since history does move on, 
the environmentally driven improvement in 
intelligence that we are seeing today has 
happened before, and probably in a more 
extensive way than could be documented by 
changes in performance on the intellectual 
puzzles we call “intelligence tests.” 

The present human species has been 
around for roughly 100,000 years. The last 
five thousand have seen the development of 
ideas that have profoundly changed human 
existence. These include the concept of agri¬ 
culture, literacy, the use of formal laws to 
order social conduct, the development of 
mathematics and logic, and induction based 
on scientific reasoning. These ideas in turn 
made possible the spread of technologies 
that fostered further new ideas. Agricul¬ 
ture made economic specialization possible, 
which in turn led to urbanization, a need for 
formal laws, business transactions, surveying 
of land, and construction. Humanity was on 
the way toward literacy and mathematics. 

People in technologically and intellectu¬ 
ally demanding cultures respond to the chal¬ 
lenge, and move toward the top of their 
reaction ranges - at least with respect to 
those aspects of cognition that their cul¬ 
ture sees as important. The Sumerians were 
smarter than the nomads on the steppes, and 
the Roman soldiers who manned Hadrian's 
Wall were smarter than the Piets and Celts 
outside it. 

134 See Greenfield, 1998, for some early examples, espe¬ 
cially in the spatial-visual realm, and Dye, Green, & 
Bavelier, 2009, for more recent work. 
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Nobody, and no culture, thinks of 
everything. Charles Murray has coined the 
elegant term metainvention to describe ideas 
that have profound implications for society, 
such as logic and the rule of law. 135 He points 
out that virtually all metainventions have 
arisen along the Eurasian land mass, primar¬ 
ily in northern Europe, with a “recent” (since 
1600 CE) rise in the Americas, following the 
European migrations to the New World. 
Why? 

Galton and his colleagues in Victorian 
England would have said that this is because 
the northern Europeans are genetically 
superior, with respect to intelligence. There 
are scientists today who would agree with 
him, a topic that will be taken up in Chap¬ 
ter 11. The genetic hypothesis cannot be 
refuted on the basis of the geographic dis¬ 
tribution of intellectual contributions. But 
there is an environmental explanation. 

The ecologist Jared Diamond 136 has 
argued that great ideas and cultural inno¬ 
vations arise when societies meet, both 
to exchange and to challenge ideas. Dia¬ 
mond further believed that this psycho¬ 
logical/sociological process interacted with 
geography. The East-West orientation of 
Eurasia permits migration, travel, and trade 
literally from the Atlantic to the Pacific. 
Such movements produce exchanges and 
development of ideas. The Americas and 
Africa are oriented North-South, so that 
ecological barriers such as the jungles of 
Panama and the Sahara Desert discourage 
travel along the major axes of the conti¬ 
nents. A particularly telling example is in 
the New Guinea highlands, where Melane¬ 
sian peoples developed agriculture indepen¬ 
dently of its appearance in the Middle East, 
but otherwise remained in the Neolithic Era. 
Diamond argues that this was not because 
of a genetic difference between Asians and 
Melanesians; it was because of the isolation 
of the New Guinea highlands. 

Geographic isolation is not the only way 
to prevent the development of intelligence 
by engagement; social isolation works as 

135 Murray, 2003. 

136 Diamond, 1997. 
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well. The Chinese Empire in many ways 
failed to participate in the avalanche of ideas 
that started with the European Renaissance 
and the Age of Exploration. Why? At least 
in part because the conservative philosophy 
of Imperial China left the Chinese uninter¬ 
ested in other cultures. 

At both levels, the society and the indi¬ 
vidual, intelligence increases when there is 
a combination of a challenge and a willing¬ 
ness to rise to it. Humans will never produce 
a society that makes intellectual demands 
beyond the human potential - by defini¬ 
tion. How close individuals come to reach¬ 
ing their cognitive potential depends on how 
much thought their society demands from 
them. 


9.8. Summary: What Produces 
the Cohort Effect? 

Just as nonzero heritability coefficients show 
that genetic inheritance is an important 
determiner of intelligence, the rise in test 
scores throughout the twentieth century 
shows that environmental effects can be very 
powerful. These demonstrations, alone, do 
not tell us how genetic and environmental 
variables act upon intelligence. In Chapter 8 
we saw that great progress has been made 
toward finding out which genes reduce 
human intelligence markedly, but that it has 
been much harder to find the genes that pro¬ 
duce variation in the normal range. 

Much the same can be said of the phys¬ 
ical environment. We know that we should 
avoid poor nutrition, atmospheric lead, and 
other environmental hazards. Very little 
progress has been made toward identifying 
the positive aspects of the environment that 
further intellectual development, within the 
range of environments present in the indus¬ 
trially developed countries. 

Schooling is clearly important. Over the 
course of the twentieth century the devel¬ 
oped countries reached a goal of almost uni¬ 
versal education. In addition to increasing 
the level of education of the average person, 
the developed societies made a huge effort 
to reduce the number of school drop-outs. 
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This is almost certainly one reason that the 
cohort effect is largely driven by increases in 
the absolute level of test scores in the lower 
range of cognitive ability. 

The effects of changes outside of formal 
schooling are harder to establish. While 
some very intense pre-school programs, 
such as the ABCDerian effort, have shown 
promising long-term results, projects of this 
intensity have affected only a few hun¬ 
dred students. This is not nearly enough to 
influence population-level IQ scores. More 
widespread programs, such as the American 
Head Start program, are probably not inten¬ 
sive enough to make permanent changes in 
cognitive capabilities. 

Family sizes have dropped precipi¬ 
tously in the developed countries. Simi¬ 
lar reductions in family size are now being 
seen worldwide. Even after allowance for 
parental intelligence is made, smaller fami¬ 
lies produce somewhat more intelligent chil¬ 


dren. However, these effects are too small to 
account entirely for the cohort effect. 

Intelligence is statistically associated with 
a tendency to become intellectually engaged 
with challenging tasks. Although correla¬ 
tion does not mean causality, it is at least 
arguable that those people who engage in 
intellectually challenging activities improve 
their cognitive capacities - their intelligence 
in a far larger and more important sense 
than in the narrow sense of improving intel¬ 
ligence test scores. A plausible argument can 
also be made that throughout the twentieth 
century there was an increase in the cogni¬ 
tive complexity of the environment. People 
really were getting brighter, at least in the 
sense of greater acquisition of those cogni¬ 
tive skills evaluated by the tests. 

Are these intelligent people benefitting 
from their capabilities? In the next chapter 
we consider what intelligence is worth in the 
post-industrial society. 


CHAPTER 10 


What Use Is Intelligence? 


As of the end of the twentieth century, the 
United States is run by rules that are 
congenial to people with high IQ and that 
make life more difficult for everyone else. 

Herrnstein & Murray, 1994, p. 541 

The quotation from The Bell Curve: Intelli¬ 
gence and Class Structure in American Life, is 
a pretty strong statement about the impor¬ 
tance of intelligence. When Herrnstein and 
Murray made it they were attacked as eli¬ 
tist and antidemocractic. Other people, with 
impeccable democratic credentials, had said 
similar things in a less contentious way. Just 
a few years before Herrnstein and Murray 
wrote, Robert Reich, a sociologist who had 
served as Secretary of Labor in the Clin¬ 
ton administration, wrote that work has 
shifted from emphasizing the manipulation 
of objects to the manipulation of abstract 
ideas, varying from programming a robot 
to analyzing a financial system. 1 It follows 
that skill in manipulating abstract concepts, 
intelligence, has become progressively more 

1 Reich, 1991. 


valuable over time. To what extent do cog¬ 
nitive tests predict such skill? 

10.1. Problems in Investigating the 
Relationship between Intelligence 
and Success 

This chapter examines the relation between 
intelligence and success in three broad 
regions; academics, the workplace, and per¬ 
sonal life. These studies are not easy to 
do, for several reasons. First, we have to 
specify what we mean by success in each 
arena. Next, we have to select quantita¬ 
tive measures of success. Observable mea¬ 
sures, such as grade point average or money 
earned, are frequently only partially satisfac¬ 
tory measures for our criteria. They often 
have undesirable statistical and measure¬ 
ment properties that hinder analysis and 
interpretation. Finally, there is the impor¬ 
tant problem of generality. We cannot study 
the totality of academia, the workplace, or, 
certainly, personal life. We have to study 
slices of them, where the necessary measures 
can be obtained. These slices are almost 
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never random samples of the arenas, and 
in only a few cases can we obtain the ideal 
experimental group-control group contrast. 

As is true in other areas of research on 
intelligence, we can learn something from 
imperfect studies. We just have to keep the 
imperfections in mind when we consider 
what has been learned. The rest of this sec¬ 
tion describes some of these imperfections. 
Do not lose sight of the magnificence of the 
forest because the trees have woodpecker 
holes in them! Quite a lot has been learned. 

lo.i.i. The Conceptual Criterion Problem 

The biggest problem is defining success. In 
the academic arena a student is successful if 
he or she has learned. The commonest mea¬ 
sure of academic success is a person’s grade 
point average (GPA) across classes. How¬ 
ever, grade point averages are not compara¬ 
ble across classes or institutions. A student 
with a GPA of 3.5 in English classes in a 
community college is not necessarily a bet¬ 
ter student than one with a GPA of 3.1 in 
Physics at Stanford. Merging gross measures 
of learning, such as GPA, across subjects or 
across schools introduces unwanted sources 
of variance. This will make the intelligence- 
GPA relation appear to be smaller than it 
is. But measuring the relation in one class 
or institution raises question about how the 
finding can be generalized. 

An alternative measure of success is grad¬ 
uation or, in the K-12 system, its inverse, 
dropping out. Once again we have noncom¬ 
parability across institutions; without nam¬ 
ing names, not all our high schools, colleges, 
and universities are equivalent. Americans 
keep school records by district or state, not 
by a national register. If a student disappears 
from a K-12 system or fails to complete a 
postsecondary program, there is no record 
of where that student went. They may have 
dropped out, or they may have enrolled at 
another educational institution. 

It is even harder to define success in the 
workplace. Within an industry or occupa¬ 
tion income partially captures the idea of 
success, but incomes across occupations are 
hard to compare. Incomes are also often 


determined by variables unrelated to intel¬ 
ligence, such as seniority of employment. 
Some of our larger companies do keep 
records of periodic evaluations of employee 
performance, most commonly supervisors’ 
ratings. Ratings are not reliable unless the 
raters are trained and the criteria for rating 
have been agreed upon. Objective measures 
of employee output are often hard to come 
by and generally capture only a part of a 
person’s job. COSTCO, a giant warehouse 
sales company, tracks the number of check¬ 
outs per hour that each of their check-out 
clerks handles. It does not directly measure 
things like customers’ reactions to a clerk’s 
manner. 

Defining success in life is even harder. 
We can measure extreme social adjustment, 
which can vary from achieving a civic prize 
to going to jail, but most people do neither. 
Success in life is a multifaceted thing. Infor¬ 
mative studies have been conducted of the 
relation between intelligence and particular 
aspects of life success, such as health, but 
trying to relate intelligence, or virtually any 
other trait, to such a nebulous thing as ‘life 
success” is probably not a useful exercise. 

Once we have defined our criteria we 
face the problem of actually getting the 
data. Several strategies have been followed. 
One is to conduct an experimental study, in 
which the investigator obtains measures of 
both intelligence and success from a selected 
set of participants. To take an example, 
one study related intelligence test scores 
to success as a race track gambler. 2 Such 
studies tend to be fairly small and to deal 
with unique situations. Because they are 
small, they can detect only large relation¬ 
ships. (Technically, they have low statisti¬ 
cal power.) This brings us to a discussion of 
statistical issues. 

10.1.2. The Statistical Problems 

We measure the extent to which intelligence 
is related to some index of success by cal¬ 
culating predictive validity, which is defined 
as the correlation between a measure of 

2 Ceci & Liker, 1986. 
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intelligence and the criterion measure. The 
process is sensitive to three statistical issues: 
reliability, range restriction, and gener¬ 
alization. In order to understand them 
we need a brief digression into statistical 
reasoning. 

THE RELIABILITY ISSUE 
Any measurement contains two elements, 
a “true value” and a residual term. While 
the residual term is frequently referred to as 
“error,” it is not necessarily error in the sense 
of a mistake. It refers to the sum of all influ¬ 
ences on the measured variable that are sta¬ 
tistically independent of the true value. To 
take an example, consider the way in which 
weight is measured during the typical annual 
physical. Examinees are told to take off their 
shoes and stand on a scale. Measured weight 
is then shown on the scale. The measured 
weight has the following components: 

Measured weight = actual body weight + 
(weight of clothes + scale bias), 

where scale bias refers to any tendency of 
the scale to weigh high or low. The terms 
in parentheses, here weight of clothes and 
scale bias, are residual effects, uncorrelated 
with the examinee’s actual weight. If an 
examinee were to be weighed on a different 
scale, wearing different clothes, measured 
weight might change even though actual 
body weight remained the same. Measured 
weight is said to be reliable to the extent that 
the same measure is obtained across compa¬ 
rable conditions. This reasoning applies to 
intelligence testing. 

An intelligence test score x is determined 
by the examinee’s “real” intelligence and a 
residual term that is unique to the exami¬ 
nation of that person at that time. Exactly 
the same thing can be said of an academic 
grade, y. The grade is determined in part by 
what the student really knows about, say, 
English Literature and in part by a residual 
term unique to the examination and the per¬ 
son. Symbolically, 

0°0 


where the subscript t stands for “true” and 
e denotes the residual term. Now define the 
reliability of an intelligence test or a grade as 
the correlation between two measures, each 
assumed to be equally good, taken on the 
same person. Examples would be the corre¬ 
lation between two equivalent forms of the 
SAT, or the correlation between the grades 
assigned to the same set of English Litera¬ 
ture examinations by two equally qualified 
graders. 

What we can observe is the correla¬ 
tion between test scores and grades, r xy . 
What we want to know is the correlation 
between intelligence and academic achieve¬ 
ment, r XtVy . This is 


where r** and r yy are the reliability corre¬ 
lations for the intelligence test, x, and the 
academic measure, y. The correlation r^ is 
sometimes referred to as the “true” correla¬ 
tion. 

As reliability coefficients range between 
o and 1, the denominator, *Jr xx ryy, will 
also range between zero and one. There¬ 
fore, the correlation between the “true” vari¬ 
ables, r^y,, will be at least as big, and gen¬ 
erally larger, than the correlation between 
the observed variables, r xy . Corrections for 
unreliability have to be treated with caution, 
as the reliabilities are themselves estimates, 
and if they are too low the estimated true 
correlation can exceed one, which is obvi¬ 
ously not correct. 

In both the academic and industrial 
cases the difference between the observed 
correlation and the (estimated) true correla¬ 
tion can be substantial. Professionally devel¬ 
oped intelligence tests generally have relia¬ 
bilities of .85 or above, but this is because 
of a great deal of careful item selection 
(Chapter 2). Large-scale academic achieve¬ 
ment tests, such as those used in the United 
States to assess educational progress on 
a statewide basis, 3 have similar reliability 

3 As of 2009 such tests were required by federal law - 

the No Child Left Behind Act. 
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coefficients. Within-class, teacher-assigned 
grades are quite another matter. I know of 
one case where thirty essays were graded, 
independently, by two university profes¬ 
sors. The correlation between the two sets 
of grades was .3I 4 Fortunately, this is an 
extreme. In most cases grades have reliabili¬ 
ties in the .6 to .8 range. This means that if a 
typical study of the relation between intel¬ 
ligence and grades within a class produces a 
value of r, that value should be multiplied 
by approximately 1.38 to estimate the true 
correlation between intelligence and aca¬ 
demic achievement. 5 This substantial cor¬ 
rection applies to studies of grades within a 
class. As the GPA is an average across classes, 
the reliability of the GPA is much higher 
than the reliability of a grade within a class, 
so the correction would be smaller. 

Probably the most commonly used crite¬ 
rion for achievement in industrial settings is 
a supervisors rating of performance. Unless 
these ratings are the result of carefully struc¬ 
tured evaluations they are likely to have reli¬ 
abilities in the .6 range or lower, consider¬ 
ably lower than the typical reliabilities of 
cognitive tests. 

RESTRICTION OF RANGE 
In most studies the variability of intelli¬ 
gence in the group actually studied will be 
smaller than the range in the population to 
which we wish to generalize. The problem 
of estimating intelligence-grade relations in 
elementary schools (K-5) illustrates the sit¬ 
uation. Elementary schools generally draw 
students from the neighborhood immedi¬ 
ately around them. In most industrially 
developed countries neighborhoods tend to 
have distinct socioeconomic and sometimes 
demographic characteristics. Therefore, the 
student demographics in a single elementary 
school will be more homogeneous than in 
the district or state. We say that the scores 
are subject to range restriction. In the case 

4 The statement is based on personal observation. 

5 This conclusion follows from the following argu¬ 
ment. Assume that the reliability of the intelligence 
test is .88 and the reliability of the grade is .6. By 
application of equation 10.2, the observed correla¬ 
tion should be multiplied by 1/1/528 = 1.38. 


of the elementary schools, we would expect 
there to be less variability in a measure of 
intelligence within a school than across a 
state. 

Range restriction influences the correla¬ 
tion coefficient. Let cr s be the standard devi¬ 
ation of observed scores in the sample, and 
cr p be the standard deviation in the pop¬ 
ulation (in the school and in the district, 
in the elementary school illustration). The 
relation between the observed correlation in 
the sample, r S) and the correlation to be esti¬ 
mated in the population, r p , is 



In this equation r p is greater than r s if 
o p > cr s . Note that this corrects only for 
attenuation on one of the two variables, 
either the intelligence variable or the crite¬ 
rion variable. Correction on both variables 
is also possible and often reasonable. For 
instance, suppose that a study were done in 
which we observed the correlation between 
intelligence test scores and scores on an 
achievement test in a school, and we wanted 
to estimate the correlation in the state. It 
would be appropriate to correct for range 
restriction on both the intelligence test and 
the achievement test. 

Selection restriction is an important special 
case of range restriction. Selection restric¬ 
tion occurs whenever an applicant popula¬ 
tion takes a predictor test, here some sort 
of intelligence test, as part of application 
for a job or educational opportunity. All 
applicants above a given cut score are then 
accepted, and their performance on the job 
or in the school is recorded. For exam¬ 
ple, suppose a university uses an entrance 
examination, and admits the top 50% of the 
applicants. In order to validate the entrance 
examination, university officials would want 
to know if it was a good predictor of the 
grades that an applicant would obtain. How¬ 
ever, grades are available only for the admit¬ 
ted students. Since, obviously, the top 50% 
of the applicants will have less variation in 
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examination scores than the entire group of 
applicants, the correlation between exam¬ 
ination scores and grades in the applicant 
population can be estimated by comput¬ 
ing the examination-grades correlation in 
the admitted group, and then correcting for 
range restriction. 

Corrections for restriction in range can be 
substantial. A reasonable value for the hypo¬ 
thetical university example is cr s = .6 cr p . 6 
An observed correlation of .33 in the selected 
students would be corrected to .50 for the 
applicant population. As correlations are 
often squared and reinterpreted as repre¬ 
senting "percentage of variance accounted 
for,” this would change r 2 from 11% to 25%, 
a substantial change. 

Because corrections for reliability and 
range restriction can be considerable, know¬ 
ing when to use them is important. Here are 
some general rules. 

1. Correction for reliability is appropri¬ 
ate when one’s interest is in theoreti¬ 
cal constructs underlying measures - for 
instance, whether intelligence as a con¬ 
cept is related to academic ability as a 
concept. The correction is not appro¬ 
priate when one’s interest is in whether 
one set of scores predicts the value of 
another set of scores - for instance, if 
you wanted to know whether the SAT 
predicts first-year college GPA. 

2. Correction for selection restriction 
should be done whenever the purpose 
of the study is to determine the valid¬ 
ity of a predictor, such as an entrance or 
hiring examination. 

3. Correcting for range restriction is appro¬ 
priate when the available observations 
are known to be a nonrandom sample 
in which scores are less variable than 
they are in the population. The case of 
using observations within a single school 
to estimate a population in the district 
is an example. However, in such cases 

6 In the case of selection restriction the percentage 
of applicants accepted determines the relationship 
between the variances in the sample and the popu¬ 
lation. In other cases of range restriction this has to 
be estimated. 
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correcting for range restriction will be 
possible only if an estimate of the pop¬ 
ulation standard deviation is available. 

4. If the sample is a true random sample of 
the population, the correction for range 
restriction should not be used. 

5. Any correction for range restriction car¬ 
ries with it the assumption that the same 
(linear] relation holds between scores 
in the sample and in the population. 
This is not a trivial assumption. To 
take one example, there is evidence that 
the relation between IQ test scores and 
adult age is nonlinear. Scores decline 
more sharply with age beyond sixty 
than before. Therefore, it would not be 
appropriate to apply range restriction to 
estimate the age-IQ relation in adults 
from a sample of adults age sixty and 
older. 

Rule 5 leads to a discussion of our last 
statistical issue, power. 

Statistical power. To explain these issues, 
we need a bit of notation and a review of 
introductory statistics. 

By tradition, scientific results are said to 
be "statistically significant” if they would be 
obtained by chance only in fewer than 1 out 
of 20 studies [p < .05] or fewer than 1 out 
of 100 studies (p < .01), on the assumption 
that the variables being studied in a sam¬ 
ple are actually unrelated in the population 
(the “null hypothesis”). In research on intel¬ 
ligence, “unrelated” means that in the pop¬ 
ulation there is no correlation between the 
predictor (an intelligence test score) and the 
criterion, r p = o. However, r p cannot be 
observed directly. Instead it is estimated by 
an observed correlation, r Sf in a sample of N 
observations. 

Assuming that the sample can be 
regarded as being chosen randomly from the 
population, there will be some critical value 
of the observed correlation, r*, such that 
if the observed correlation, r S) exceeds that 
value (r y > r*) we reject the null hypothesis 
that r p = o at some level, p, where p refers to 
the probability of observing r s > r* if the null 
hypothesis is true. The value of r* increases 
if we lower the significance level (typically 
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from p = .05 to p = .01) and decreases as 
the size of the sample, N, increases. To take 
some examples, at the p < .05 level the criti¬ 
cal value, r*, is .36 for a study with 30 obser¬ 
vations (N = 30], and .20 for a study with 
N = 100. At the p < .01 level the values are 
.46 and .26. 

This much is taught in elementary statis¬ 
tics. The second point is taught but often 
not stressed. Suppose that the sample cor¬ 
relation is less than the critical value, r s < 
r*. This means that we cannot reject the null 
hypothesis. “Not rejecting” is not the same 
as accepting. What we have is what, in law, 
would be called a verdict of “not proven.” 7 

Suppose that the population correlation 
is some value other than zero. (For sim¬ 
plicity, consider only positive values.) There 
would still be some probability that the sam¬ 
ple correlation fell below the critical value - 
that we observe r s < r* even though r p > 
o. This probability depends upon what the 
population value is, so the probability has 
to be specified given a population value and 
the size of the study, Pr[r s < r*\r p = k, N). 
The power of a study is the complement of 
this, 

Power[r p — k, N) 

= Pr(r s > r*\r p = k, N). (10.4) 

In words, this is the probability that a sample 
of size N, drawn from a population in which 
the population correlation has value k, will 
have a sample correlation above the critical 
value. Going back to the earlier example, 
suppose that the population correlation is 
.50. If we set the significance level atp < .05, 
the power of a study with a sample size of 25 
is .84. This means that 16 out of 100 samples 
will not reach a value reliably greater than 
zero even though the population correlation 
is a substantial .50. 

Power increases with sample size. In the 
example just given, if the sample is increased 
to 100, the power is greater than .995. 

The power problem becomes critical 
when it is combined with the problem of 

7 Such verdicts are not allowed in US courts, but they 

are allowed in some countries. 


criterion reliability. Grades within a class 
and employer rating systems will often have 
a reliability of around .60, and intelligence 
tests will have a reliability of about .85. Sup¬ 
pose that the true correlation between intel¬ 
ligence and academic ability, the hypotheti¬ 
cal variables underlying these measures, is 
.50 in the population. A bit of algebraic 
manipulation of equation 10.3 will show 
that the expected population correlation 
between test scores and grades is .26 after 
correction for attenuation. Setting the sig¬ 
nificance level at .05, the power of a study 
with 25 is participants is approximately .36. 8 
About two out of three studies of this size 
would not provide strong enough evidence 
to reject the null hypothesis that r p = o, 
even though it is false. If the sample size 
were to be increased to 100, power would 
increase to about .75. In this case failure to 
reach statistical significance would be rea¬ 
sonable evidence against the hypothesis that 
a “true score” value of r p was .50 or larger. 

What these examples show is that power 
is produced by an interaction between the 
reliability of the measures and the size of 
the study. This interaction has to be taken 
into account in evaluating null results. Do 
people actually fail to do this? The answer 
is, stunningly, “yes.” Panel 10.1 presents the 
case of a widely cited study in which no con¬ 
sideration was given to these issues. 

10.1.3. Drawing Conclusions in the Face 
of Statistical Uncertainties 

Given all these problems, can any conclu¬ 
sions at all be drawn? The answer is “yes,” 
but only after careful consideration. 

When evaluating empirical results we 
have to consider which statistic is appropri¬ 
ate. Are we interested in the observed cor¬ 
relation, or should the correlation be cor¬ 
rected for reliability and/or restriction in 
range? The rules given in the previous sec¬ 
tion apply. 

We must be aware of power consider¬ 
ations. We need to be especially wary of 

8 Power estimates are based on Table 3.3.2 in Cohen, 

1988. 


Panel 10.1. A Day at the Races: 

A Failure to Consider Power 
and Reliability 

In 1986 two Cornell University psycholo¬ 
gists, Stephen Ceci and J. K. Liker, pub¬ 
lished an eye-catching article entitled "A 
Day at the Races.”* They reported a four- 
year study of the expertise of a group of 
thirty habitual bettors on harness racing. 
Ceci and Liker did not study the accu¬ 
racy with which these bettors predicted 
winners because, as they said and as 
many horse racing fans know, the winners 
are often determined by unpredictable 
events. Instead they studied the accuracy 
with which the bettors were able to pre¬ 
dict the favorite and top three favorites at 
post time (the start of the race), given the 
extensive information about each horse 
that was contained in the daily racing 
form, which is available to bettors prior 
to a race. Although secondary references 
often misinterpret this, what Ceci and 
Liker actually studied was their partici¬ 
pants' ability to predict how other bettors 
would place their bets, not which horse 
would win. 

Ceci and Liker found that the par¬ 
ticipants’ decision was a mathematically 
complex function of the information con¬ 
tained on the racing form, and that the 
accuracy of the predictions had a corre¬ 
lation of —.03, essentially zero, with the 
participants’ IQs scores on the short form 
of the WAIS.* Ceci and Liker drew the 
following strong conclusion: 

(a) IQ is unrelated to performance at 
the racetrack but, more important (b) 

IQ is unrelated to real-world forms of 
cognitive complexity. 

Ceci Liker , 1986, p. 255 

These are strong words indeed. The 
null finding was claimed to be reliable, 
and the task, something that is related to 
but not the same as picking the winners in 
a race, was unhesitatingly generalized to 
the universe of complex tasks. Nowhere 
in the article was there any mention of 
reliability or power. 

Douglas Detterman and his colleague 
Kathleen Spry wrote a detailed critique 


of the Ceci and Liker study. 1 Among 
other things, they observed that Ceci and 
Liker’s criterion, the ability to predict the 
odds at post time, had a reliability of at 
most .41. What does this mean? Suppose 
that the correlation between the underly¬ 
ing abilities, intelligence and skill at set¬ 
ting the odds, is 1. In terms of the text, 
r p = 1. The reliability of the short form 
of the WAIS is known to be .85. There¬ 
fore, the expected value of the correlation 
in the sample would be .85 x .41 = .35. If 
N = 30, the power of the Ceci and Liker 
study would be approximately .5, which 
means that a study like theirs should fail 
to reach the convention .03 level of statis¬ 
tical significance five out of ten times even 
if the underlying correlation was one. 

Of course, nobody thinks that the cor¬ 
relation between intelligence and race 
track betting is identically one. Based on 
meta-analysis, a widely quoted estimate 
of the correlation between intelligence 
and performance on a cognitively com¬ 
plex task is r p = .5.** To be generous, 
increase this to .6. Then, solely on the 
basis of reliability, the expected sample 
correlation would be r s = .21. Using this 
estimation the power of the Ceci and 
Liker study was .20; studies like theirs 
should fail to reach the .05 level of sig¬ 
nificance four out of five times. 

The Ceci and Liker study presents 
us with good news and bad news. The 
good news is that when a published 
study contains major flaws, other scien¬ 
tists point out the errors. The bad news 
is that almost no one notices the correc¬ 
tion. The Ceci and Liker study has been 
cited ninety times, as evidence that intel¬ 
ligence, as measured by the IQ tests, is 
not related to real-world cognition. The 
Detterman and Spry study has been cited 
seven times.** 

* Ceci & Liker, 1986. 

T This figure is a correction to the original value, 
provided in Ceci & Liker, 1987. 

’ Detterman & Spry, 1988. This much-neglected 
article contains several other strong criticisms of 
the Ceci and Liker work. 

5 Schmidt & Hunter, 1998. 

** Data collected from an ISI Web of Knowledge 
citation search, July 2, 2009. 
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concluding that there is no relation between 
intelligence and some other variable when 
the study involved is small or uses a measure 
of unproven reliability. 

One way to address the power issue is 
to do studies with a very large number of 
participants. The extreme case is to uti¬ 
lize large surveys, such as the Department 
of Labor longitudinal studies of US citi¬ 
zens described in panel 9.9. Sometimes sur¬ 
veys are “constructed” by analyzing records 
of intelligence, health, educational accom¬ 
plishment, and occupational status that have 
been collected for other purposes. 

Obviously, though, the larger the study 
and the more time required of the partic¬ 
ipants, the more costly the experiment or 
survey. In many cases the investigator has 
to accept less-than-ideal measures, such as 
using a brief vocabulary test as a proxy for a 
measurement of intelligence, or using place 
of residence as the sole measure of a par¬ 
ticipant's socioeconomic status. Such com¬ 
promises are not fatal errors; they are things 
that have to be considered when evaluating 
results. 

Another way to address the power prob¬ 
lem is to use a statistical technique called 
meta-analysis to draw conclusions from mul¬ 
tiple studies .9 Special statistical methods are 
used to identify trends that may not be clear 
from focusing on the details of any one 
study. This technique can be quite reveal¬ 
ing. However, there are reservations. 

As is the case for any statistical technique, 
generalizations based upon any unjustifiable 
assumptions of random sampling are sus¬ 
pect. For example, many studies have been 
conducted of the relation between scores on 
college entrance examinations and student 
grades. Meta-analysis can, and has, been 
applied to these studies. The participating 
institutions tend to be the larger institutions, 
with budgets sufficient to support internal 
research organizations. Therefore, the result 
of a meta-analysis of such studies is a useful 
descriptive statement, but any appeal based 
upon a claim of random selection of institu¬ 
tions is questionable. 

9 Hunter & Schmidt, 1990. 


The individual studies reviewed in a 
meta-analysis will, inevitably, vary in the 
quality of the measurements taken and the 
procedures used. These considerations are 
judgments that have to be made by con¬ 
sidering the details of each study. They do 
not lend themselves to statistical treatment. 
All reviewers will agree, for instance, on the 
number of participants in a study, and the 
effect this has on statistical power. They may 
not agree on the appropriateness of the mea¬ 
sures used, or the way in which the mea¬ 
sures were taken. Some meta-analyses have 
attempted to deal with this problem by clas¬ 
sifying studies by their perceived quality, 
and then analyzing high- and low-quality 
studies separately, to see if this makes any 
difference. A finding that only appeared 
in low quality studies would certainly be 
treated with suspicion. 

10.1.4. Problems Related to 
Research Design 

The final problems to be considered have to 
do with research design, rather than statis¬ 
tics and measurement. 

The ideal research design is a prospective 
study, in which the investigator obtains data 
on the intelligence of people at some point in 
their lives, ideally before they enter an aca¬ 
demic program or the workforce, and then 
determines how well they succeed. This is 
by far the easiest kind of study to interpret. 
However, it is possible only if the investiga¬ 
tor has some way of testing a large number 
of people, and then following them for a rea¬ 
sonably long period. There are a few studies 
that have done this. The Seattle Longitu¬ 
dinal Study 10 (panel 9.2] and the National 
Longitudinal Studies (panel 9.9) are impor¬ 
tant examples. However, such studies are 
expensive, and so are few and far between. 

Prospective studies can sometimes be 
conducted by examination of government 
records. Studies of this sort have been car¬ 
ried out in those European countries in 
which eighteen- to twenty-year-old men 
have to register and be tested for potential 

10 Schaie, 2005. 
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military enlistment. (As far as I know, Israel 
is the only country that requires registration 
for both men and women.) Valuable infor¬ 
mation can be gained if some of the reg¬ 
istrants can be reinterviewed later in life, 
to determine how well they have fared. 
In some countries this can be done with¬ 
out actually interviewing the individuals, 
because the government keeps extensive 
records of the health, education, and income 
of its citizens. Legal and ethical issues con¬ 
cerning access to such data have to be 
resolved, but the important point is that the 
studies often can be done. 

The alternative to a prospective study of 
the relation between intelligence and suc¬ 
cess is a retrospective study. In a retrospec¬ 
tive study a group of people are identi¬ 
fied who have, or do not have, varying 
degrees of social success. The investigator 
then attempts to determine their intelli¬ 
gence, either by direct testing or by exam¬ 
ination of relevant records. Studies of emi¬ 
nence or genius often fall into this category. 
The investigator identifies a group of indi¬ 
viduals who meet some criterion for accom¬ 
plishment and then tries to identify the com¬ 
mon characteristics of the group. Possibly 
one of the most ambitious of these stud¬ 
ies was Simonton's determination of the 
correlation between a measure of intellec¬ 
tual capacity, reconstructed from historical 
records, and historians’ ratings of the perfor¬ 
mance of the forty-two US presidents, from 
George Washington through George Bush. 11 
The correlation was .56. 

Studies of the relation between intelli¬ 
gence and success are prone to collinear- 
ity problems. To illustrate, intelligence test 
scores during adolescence are positively cor¬ 
related with subsequent income. 12 Does this 
mean that high intelligence causes a rise in 
income? Perhaps. But it is also true that 
children's test scores are positively corre¬ 
lated with parental socioeconomic status, 
although the correlation (^.40) is not as high 
as many people assume. Is current income 
due to intelligence, or is it a legacy of the 

11 Simonton, 2006. 

12 Herrnstein and Murray, 1994. 
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privilege of having come from a wealthy (or 
poor) background, with concomitant oppor¬ 
tunities (or lack of opportunities) to get a 
foothold on the economic ladder? Or both? 

10.2. The Relation between Intelligence 
and Academic Achievement 

Binet’s motivation for constructing the orig¬ 
inal intelligence test was to identify children 
who were at risk for failing in the standard 
academic system. Subsequent test develop¬ 
ers generalized the goal to predicting degrees 
of success at all levels of education. How 
well has this worked? 

10.2.1. Intelligence in the K-12 System 

In 1972 the American clinical psychologist 
Joseph Matarazzo reviewed the evidence, 
and concluded that IQ, as measured by the 
Wechsler tests, was a good predictor of high 
school graduation. 13 In 1994 Herrnstein and 
Murray addressed the same question using 
the AFQT and the NLSY79 data. Figure 10.1 
shows their results for graduation rates in 
the 1980-85 period. The finding is clearly 
robust. Different tests were used at differ¬ 
ent times, and with different definitions of 
“failure to graduate.” Nevertheless, both 
studies found the same positive relation 
between test scores and probability of 
graduation. 

Matarazzo also said that, based on “thou¬ 
sands” of studies, it had been shown that 
intelligence test scores correlate .50 with 
grades in the K-12 system. 14 This estimate 
has been widely accepted by subsequent 
reviewers. 15 Later reviewers do qualify the 
estimate, by saying that the correlations tend 
to be higher in elementary than in mid¬ 
dle school, and drop to perhaps .40 in high 
school. As psychological studies are noto¬ 
rious for failures to replicate findings, the 
agreement among reviewers, over a consid¬ 
erable time span, is reassuring. 

13 Matarazzo, 1972. 

14 Matarazzo, 1972, p. 283. 

15 Brody, 1992, pp. 251-254; Jensen, 1998; Macintosh, 
1998, Chapter 2; Neisser et al., 1996. 


320 


HUMAN INTELLIGENCE 



Figure 10.1. Percentage of White young adults in the NLSY79 
survey who did not complete high school, plotted as a function of 
their percentile scores on the AFQT. Data from Herrnstein & 
Murray, 1994. 


The .50 figure applies to measures of 
grades computed across classes, elementary, 
middle, or high school GPA. If the corre¬ 
lation is calculated for scores within a sin¬ 
gle class, it will drop due to range restric¬ 
tion and, often, due to the lowered reli¬ 
ability of locally produced tests, A study 
by researchers at the University of Penn¬ 
sylvania found a correlation of just slightly 
over .30 between Otis-Lennon test scores 
(see Chapter 2) and academic achievement 
on tests at the end of the eighth grade. 16 
This study was done in a "magnet” high 
school where the students had already been 
selected on the basis of test scores and previ¬ 
ous grades, so range restriction was certainly 
a factor. Other studies confined to a single 
school or class have found correlations of 
about .5 between test scores and grade point 
averages in the early primary grades and in 
high school. 17 

Macintosh 1 " has observed that although 
restriction of range is frequently appealed 
to as a mechanism that should reduce test- 
achievement correlations, the effect has 
never actually been observed, at least in 
studies of the K-12 system. Two large Euro¬ 
pean studies come close to addressing Mac¬ 
intosh’s concern. 

16 Duckworth & Seligman, 2005. 

17 Kaplan, 1996; Zwick & Green, 2007. 

18 Macintosh, 1998. 


The English system of education is much 
more centralized than the American. One 
year something over 78,000 students were 
given the Cognitive Abilities Test (CAT, 
described in Chapter 2). The test takers 
included almost all the eleven-year-olds in 
England, so range restriction is not rele¬ 
vant. At age sixteen all English students take 
nationwide examinations in a variety of sub¬ 
jects. The national examinations are subject 
to much more careful psychometric evalua¬ 
tion than is typically the case for locally gen¬ 
erated (and certainly for teacher-generated) 
examinations, so reliability of the criterion 
variable was not a major concern. 

Ian Deary and his colleagues 19 extracted 
a general intelligence (g) factor from the 
CAT scores and a general academic achieve¬ 
ment factor, which I will call a , from the 
scholastic examination scores. The correla¬ 
tion between the two was r ga — .81. This 
is a very high value. There was substan¬ 
tial variation between associations with the 
g factor and educational accomplishment 
within topics. Correlations ranged from a 
high of .77 for mathematics to .43 for Art 
and Design. In general, the topics usu¬ 
ally considered the academic core courses - 
the Humanities, Mathematics, and the Sci¬ 
ences - had correlations in the .50-.75 range, 
while “practical” topics, such as Art and 

19 Deary, Strand, et al., 2007. 
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Figure 10.2. The distribution of RSPM scores across grades in a 
representative sample of 500 Icelandic schoolchildren. From Pind, 
Gunnarsdottir, & Johannesson, 2003, Figure 1, with permission from 
Elsevier. 


Design, Music, and Textiles, had correla¬ 
tions in the .4-.5 range. This very large study 
provides strong evidence for a robust rela¬ 
tionship between intelligence, as assessed 
by a standard test, and general academic 
achievement. 

Correlations between battery-type tests 
and academic achievement can be criticized 
on the grounds that some of the subtests 
in a test battery are close to a sample of 
academic tasks, so what we are measuring 
is the stability of academic aptitude, rather 
than a more general factor of intelligence. 
This interpretation cannot explain the fact 
that results somewhat similar to those in 
the English study have been obtained in 
Iceland, using Raven’s Standard Progressive 
Matrices (RSPM], which is certainly not 
tied to the K-12 curriculum. 20 As was the 
case for the English study, this study used 
a very large sample, representative of the 
population of schoolchildren in Iceland. Fig¬ 
ure 10.2 shows the results. Scores rise over 
the school-age years, agreeing with our intu¬ 
ition that children increase in their cogni¬ 
tive competence as they grow older. The 
RSPM scores within class levels were corre¬ 
lated with grades. At the seventh-grade level 

20 Pind, Gunnarsdottir, & Johanesson, 2003. 


the correlations were .75 for overall mathe¬ 
matics grades and .64 for overall Icelandic 
(language arts) grades. 

The British and Icelandic studies agree 
with each other well, even though the 
particular tests of intelligence used were 
quite different. They answer Macintosh’s 
legitimate concern. Range restriction effects 
do operate in small, localized studies, and 
therefore corrections for range restriction 
are appropriate. The correlation between 
intelligence test scores and academic 
achievement in the K-12 system, calculated 
across large units, such as districts, is at least 
.50 for the core academic courses, such as 
language arts, science, and mathematics, and 
somewhat lower for courses in vocational 
topics and in the arts. Within smaller units, 
such as a school or a class, the correlation 
will drop to around .30, due to restriction of 
range. 

In the K-12 system cognitive test scores are 
used to identify students whose low scores 
indicate that they may need to be assigned to 
special education classes. Tests are also used 
to assign students to accelerated programs 
for the gifted. In both cases other factors are 
also considered. The majority of students fall 
somewhere between these extremes, and for 
them the test scores do not matter, for no 
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Table 10.1. Correlations between tests used for college/university selection, the 
ASVAB general factor or AFQT, and Raven's Advanced Progressive Matrices 

Test SAT ACT ASVAB 


ACT .87 [1) 

ASVAB .87 (3), .92 (1] .77 [4), .90 (1) 

Raven's Advanced Progressive Matrices .71 (2} .61 (3] 


Source: Data sources are (1) Coyle & Pillow, 2008 (NLSY97 data]; (2] Frey & Detterman, 2004; 
(3) Koenig, Frey, & Detterman, 2008. 


decisions are made on the basis of these 
scores. In college and university entrance 
decisions test scores matter, a lot, across the 
entire range of scores. 

10.2.2. Intelligence in the Post-Secondary 
System 

Since World War II American colleges and 
universities have incorporated two major 
testing programs, the SAT and the Amer¬ 
ican College Testing Program (ACT], into 
the admissions process. These tests are val¬ 
idated regularly, by correlating test scores 
with first-year grade point average (GPAi], 
cumulative grade point average (GPAC), or 
probability of graduation within a specified 
period of time after matriculation, usually 
four to six years. The use of the tests is 
not without controversies, a point that was 
made earlier (section 2.7.3). We concentrate 
on technical rather than policy issues here. 

Recall, from the discussion in Chapter 2, 
that the first portion of the current SAT, 
referred to officially as the SAT-I, contains 
sections stressing verbal comprehension and 
logical reasoning. By tradition (and officially, 
in earlier versions), these two sections have 
been referred to as the SAT-V and SAT- 
M. I will continue this usage. Both tests 
represent attempts to evaluate comprehen¬ 
sion and reasoning without tying questions 
to specific high school curricula. 

The American College Testing program 
takes a different approach. It develops tests 
that are specifically tied to curricular mate¬ 
rial, such as history and mathematics. The 
idea is to predict what a student will learn 
in college by determining how much he or 
she has learned in high school. The second 


part of the SAT program, the SAT-II, does 
the same thing. 

Although there have been arguments 
about which approach is better, 21 the tests 
could be interchanged in an academic 
selection program without changing accep¬ 
tance and rejection decisions very much. 
Table 10.1 presents estimates of the cor¬ 
relations between the SAT, ACT (sum¬ 
mary score), and the general factor derived 
from the ASVAB, which is closely approx¬ 
imated by the Armed Forces Qualify¬ 
ing Test (AFQT). The correlations with 
Raven's Advanced Progressive Matrices are 
also included, in order to show the rela¬ 
tion between the educational tests and an 
avowed marker for general intelligence, g. 

The correlations are quite high. The cor¬ 
relation between the SAT and the ACT 
approaches the reliabilities of the two 
tests. This suggests that the true correla¬ 
tion between the two tests is one! A study 
using NLSY97 data found that both aca¬ 
demic aptitude tests had loadings of about 
.9 on a general factor derived from the 
ASVAB. 22 The finding is important because 
the ASVAB general factor is a measure of 
crystallized intelligence (Gc), rather than 
of Gf. 2? 

The need to distinguish between Gf and 
Gc in college students is shown by the fact 
that the correlations between matrix tests 
and the SAT and the ACT are in the .6- 
.7 range. 24 This is about what one would 

21 Lemann (1999) discusses the dispute in some detail. 

It has been carried forward to this day. 

22 Coyle & Pillow, 2008. 

23 Roberts et al., 2000. 

24 Frey & Detterman, 2004; Koenig, Fry, & Detterman, 

2008. 
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expect, because the fact that Gc and Gf 
are themselves correlated in the .5-7 range, 
depending upon the sample. Because aspir¬ 
ing and attending college students represent 
roughly the upper two-thirds of the general 
population, in terms of cognitive skills, one 
expects the general factor to be somewhat 
weaker among this group than among the 
population at large. (See Chapter 3 for elab¬ 
oration.} 

Do the tests work? Several appropriately 
designed large studies of the SAT have pro¬ 
duced consistent results. The correlation 
between SAT scores and GPAi is approxi¬ 
mately .35 in students who have been admit¬ 
ted, and who therefore have both SAT 
and GPAi’s available. 25 This is the uncor¬ 
rected correlation in the selected popula¬ 
tion, whereas what is needed is predictive 
correlation in the applicant population. An 
extensive study by Paul Sackett and his col¬ 
leagues at the University of Minnesota, in 
which they conducted a meta-analysis of 
previous studies, shows quite clearly what 
the situation is. 

Sackett and his colleagues analyzed data 
provided by the College Board for 41 colleges 
and universities where the SAT was used 
in 1995-97. Over 155,000 test takers were 
involved. The researchers calculated three 
SAT-grade correlations. They were: 

1. r s - the correlation between SAT and 
GPAi in admitted students, calculated 
within institutions and then averaged. 
r s = - 35 - 

2. r pi - r s corrected for restriction of range 
within the applicant population for each 
institution, and then averaged. This is 
the predictive correlation that would 
be of interest to admission officers. 
r pi = - 47 - 

3. r p2 — r s corrected for restriction of range 
of SAT scores across all institutions. 
This can be thought of as the predic¬ 
tive correlation to be used to determine 
the benefit of using the test across all 
participating institutions. r pz = .53. 

25 Geiser & Studley, 2002; Kobrin et al., 2008; Sackett 
et al., 2009. 
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Freshman grade point averages indicate 
a student's initial reaction to college. What 
about predicting later performance or grad¬ 
uation? Beyond the first year there is great 
variation in the courses college students 
take, and there are also substantial differ¬ 
ences in grading practices across disciplines. 
This muddies the situation. 

There is a negative correlation between 
the SATs of students within an academic 
program and the mean grade point assigned 
by that program. This is because mathemat¬ 
ics and science programs, which assign rela¬ 
tively low grades, tend to draw the students 
with the highest SATs, while humanities 
and education programs, which assign high 
grades, draw students with lower SATs. The 
effect is quite large. A study involving over 
200,000 students from 38 public universities 
during the 1990s 26 found that the difference 
in SAT scores between the discipline with 
the highest entering scores, engineering, and 
the one with the lowest scores, education, 
was .92 standard deviation units. 27 The neg¬ 
ative correlation between the rigorousness 
of grading within a discipline and the SATs 
of the entering students will reduce the cor¬ 
relation between overall GPAs and entering 
SATs, calculated over the institution as a 
whole. 

The probability of graduation behaves 
much like, but not exactly like, GPAi. 
Herrnstein and Murray’s analysis of the 
NLSY79 database showed that, as of the 
1980s, approximately 70% of the survey par¬ 
ticipants in the top decile of AFQT scores 
obtained bachelor’s degrees. This fell to 
30% in the eighth decile, and to 10% in 
the fifth decile. 28 A detailed report from 
the College Board 29 found that graduation 

26 Kroc et al., 1997. 

27 This is a conservative estimate, based on the 
assumption that the within-discipline standard devi¬ 
ation is equal to the population standard devia¬ 
tions. The assumption is very unlikely to be valid. 
The effect of interdisciplinary variation would be to 
reduce the variance (and hence the standard devia¬ 
tion] within disciplines. The upshot would be that 
less than 18% of the entering education students 
would be expected to have SAT scores above the 
engineering mean. 

28 Herrnstein Murray, 1994, p. 37. 

29 Kroc et al., 1997. 
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rates are nonlinearly (logistic) related to 
an index composed of SAT, High School 
Grade Point Average (HSGPA), and sev¬ 
eral demographic variables, including gen¬ 
der and race. People with relatively low 
scores on the index generally were unlikely 
to graduate; people with high scores were 
highly likely to graduate; and the prob¬ 
ability of graduation changed markedly 
between “low average" and “high average" 
scores. 

As was the case for the K-12 system, 
the findings on the correlation between 
test scores and college/university success are 
strikingly consistent. The SAT, the most 
widely used test, has a predictive validity 
of about .5. This is probably an underesti¬ 
mate of the correlation between the SAT 
and an abstract measure of academic abil¬ 
ity. Because students with high SAT scores 
are more likely to enroll in “tough-grading” 
courses than students with low SATs the 
SAT-GPAi correlations will be depressed 
below what they would have been if all stu¬ 
dents took the same courses. 

Are these correlations really enough to 
justify test use? Answering that question 
requires a brief discussion of the statistics 
of personnel selection. 

10.2.3. Cognitive Tests and Selection 
Decisions 

The college/university admissions process is 
an example of a personnel selection decision. 
How useful are entrance examinations, such 
as the SAT, in making such decisions? This 
raises the question of how high a correlation 
has to be in order to be useful in practice, 
whether or not it is “statistically significant.” 
This depends upon how the correlation is to 
be used. 

A widely cited way of evaluating the size 
of a correlation is to square it, and then 
to report it as the proportion of variance 
accounted for in either variable by predic¬ 
tions using the other variable. In the admis¬ 
sions case, would be the proportion of 
variance in grades that could be associated 
with variance in an admissions test - approx¬ 
imately ,5 2 = .25. Multiplied by 100, one 


could say that 25% of the variance in grades 
is accounted for by variance in the examina¬ 
tion. If, as is often (and erroneously) done, 
this logic is applied to the uncorrected cor¬ 
relation between grades and test scores, in 
the population of admitted students, .35 2 = 
.11, so 11% of the variance of grades is asso¬ 
ciated with variance in test scores - which 
does not seem high. However, that is mis¬ 
leading. 

If the selector uses a screening exam¬ 
ination, it is possible to predict aptitude 
(grades or workplace performance) and 
accept people in order of predicted perfor¬ 
mance. Unless prediction is perfect (pre¬ 
dictive validity = 1) people with the same 
predicted performance usually turn out to 
have different actual performances. Stu¬ 
dents with identical SATs do not all have 
identical grades. In statistical terms, there is 
variance around the predicted performance 
level, and the greater the variance, the less 
accurate the prediction. However, variance 
around the predicted performance can never 
be greater than the variance in the appli¬ 
cant population. So variance in the applicant 
population can be used to scale the extent 
to which the prediction is not accurate. The 
ratio I = (variance around predicted value 
of aptitude)/(variance of aptitude in appli¬ 
cant population) represents an “inaccuracy” 
index, relative to the inaccuracy that would 
be achieved without using a selection exam¬ 
ination. It follows that the complement of 
I, 1 — /, is an index of accuracy. It can 
be interpreted as the relative reduction in 
inaccuracy achieved by using a predictor 
test. The I index is related to predictive 
validity, as defined in section 10.1, by the 
equation 

Tp ~ 1 1 (10.5) 

where p = 1 or 2, depending upon whether 
you are interested in within-institution or 
across-institution predictivity. Multiplied by 
100, r 2 is the percent increase in efficiency 
achieved by using a screening examination. 
If, as is the case, r p = .5, the increase in 
efficiency is 25%. 
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Figure 10.3. Test accuracy and rejection rate interact to produce 
quality acceptances. The expected value of aptitude for an accepted 
candidate (student or worker), measured in terms of the percentile 
of aptitude in the applicant population and shown as a function of 
the rejection rate and the predictive validity of a screening 
examination. 


At this point we can see an argument 
brewing between the admissions commit¬ 
tee and the rejected applicants. Suppose an 
applicant is rejected, and then learns that 
among accepted applicants (students) the 
correlation between SAT and grades is only 
.35. How dare the committee reject an appli¬ 
cant on the basis of a test that is only 11% 
better than chance? 

The committee’s first reply can be that 
the correlation is not really .35; it is .5. The 
applicant’s rejoinder is that a 25% improve¬ 
ment over chance still is not good enough. 
But this is not the admissions committee’s 
real argument. 

The admissions committee is not inter¬ 
ested in the accuracy of individual predic¬ 
tions; it is interested in selecting the best 
possible entering class. Suppose that the 
institution has room for only 10% of its appli¬ 
cants (a rejection rate of 90%). Insofar as is 
possible, the committee wants to select the 
top 10% of the applicants in terms of aca¬ 
demic aptitude. However, the committee 
knows only the top 10% of the test scorers. 
If r p = 1, the two "top ten percents" will be 
the same people; to the extent that r p is less 
than 1, there will be some disagreement. 

The success of the selection process will 
be determined by both the accuracy of the 
test, r p , and the rejection rate. If the rejection 


rate is zero, everyone who wants to enter 
gets to enter. The accuracy of the test does 
not matter, because no decision is going to 
be made using the test score. At the other 
extreme, suppose there is just room for one 
person. The person accepted will be the one 
with the highest test score, and the proba¬ 
bility of that person being the person with 
the highest aptitude in the applicant pool 
will depend upon the accuracy of the test. 

Between these two extremes, the ex¬ 
pected quality of accepted applicants is 
determined by an interaction between r p and 
the rejection rate. The form of this interac¬ 
tion is shown in Figure 10.3, using Sackett's 
two estimates of the predictive correlation 
as examples. If the rejection rate is low, the 
value of the predictive correlation matters 
very little. If the rejection rate is high, it mat¬ 
ters a lot. For example, if the rejection rate is 
90%, as it is for some of our elite universities, 
the use of an entrance examination with a 
predictive correlation of .47 can improve the 
mean level of aptitude in the entering class 
from the fiftieth percentile of the aptitude 
in the applicant population (no test used) to 
about the seventy-seventh percentile. 

Exactly the same reasoning applies to 
industrial hiring. If the rejection rate is high, 
a screening examination with predictive 
validity in the .4-. 5 range can substantially 
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Figure 10.4. Median weekly earnings in 2008 as a function of level of 
education. Source: US Bureau of Labor Statistics. HS - high school 
diploma. 


improve the selection process, as seen by the 
employer. 

Note that quality has been defined in 
terms of the quality available in the appli¬ 
cant population. Personnel selection has to 
operate with this constraint; you cannot 
select people who do not apply. Any recruit¬ 
ment technique that improves or diminishes 
the distribution of aptitudes in the appli¬ 
cant population will affect the quality of 
the selected applicants - either the student 
body or the workforce. What this effect will 
be will depend upon the amount of change 
in applicant aptitude and upon the effect 
of added or reduced recruitment upon the 
rejection rate. 

10.2.4. Alternatives and Augmentations 
to the Use of Test Scores in College 
Entrance Decisions 

Chapter 9, section 5, contained a century- 
old quotation from Theodore Roosevelt 
about the economic value of education. 
Figure 10.4 shows income figures, as of 2008, 
as a function of level of education com¬ 
pleted. Roosevelt’s remark rings true. How 
we decide who gets to go to college makes 
a tremendous difference in who has eco¬ 
nomic and social opportunity. Therefore, 
it is understandable that college entrance 


examinations such as the SAT and the ACT 
have received considerable scrutiny. 

SAT scores are positively correlated with 
parental SES. This has led some to fear that 
using the SAT simply identifies applicants 
who have the social and financial resources 
to complete the undergraduate program. 
Giving these applicants preference in college 
admission will therefore exacerbate inher¬ 
ited social advantages, something that is 
generally not considered a good thing in 
a democracy. (Paradoxically, the SAT was 
originally designed to reduce these advan¬ 
tages! See the discussion of the SAT in 
Chapter 2.) 

To what extent is this concern war¬ 
ranted? The way to investigate the ques¬ 
tion is to examine the partial correlation 
between grades and test scores, equating 
for SES. If the SAT is a proxy for SES, 
the partial correlation should approach zero. 
However, it does not. An analysis of the 
forty-one-institution data collected by the 
College Board found that the partial corre¬ 
lation, based on an analysis of over 155,000 
students, is .44 - very little different from 
the predictive correlation without consider¬ 
ing SES, .47. The analysis can be reversed, 
to see if SES is associated with first-year 
grades, after equating for test scores. When 
this is done the correlation between SES and 
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GPAi drops from .31 to .05. Similar results 
were found in the meta-analysis of previous 
studies. 30 

These values are consistent with the 
assumption that parental SES does influence 
undergraduate performance, but that it does 
so as a distal variable. SES exerts its influ¬ 
ence by influencing traits that are impor¬ 
tant for success as an undergraduate and that 
are measured by the SAT. Presumably these 
traits are what we mean by intelligence. 

High school grade point average 
(HSGPA) has also been used as a predictor 
of GPAi and college graduation. A study 
conducted in the University of California 
system found that, averaged over the incom¬ 
ing freshman classes of 1996-99, the SAT-I 
could predict 13% of the variance in GPAi 
[r = .36), HSGPA could predict 15.4% (r = 
.39), and the two of them together could 
predict 20.8% of the variance (R = .46]. 31 
These correlations, which are consistent 
with the data from the forty-one-institution 
study, have not been corrected for range 
restriction. As far as accuracy of prediction 
is concerned, the appropriate thing to do is 
to combine the entrance examination and 
HSGPA into a single index. 

As is true of all cognitive tests, the SAT 
has been designed to measure “can do” 
aspects of cognition. Cumulative indices of 
performance, such as HSGPA and GPAi, 
also tap “will do” aspects of performance, 
such as study habits and perseverance. We 
have seen this already for HSGPA; the 
same thing is true for GPAi. 32 The fact that 
HSGPA and the SAT, combined, do bet¬ 
ter than either alone is further support for 
an expanded definition of intelligence, to 
include skill in allocation of effort over the 
long haul, outside of the conventional test¬ 
ing paradigm. 

We have been concerned here solely with 
the relation between intelligence and cog¬ 
nitive performance after matriculation. We 
need to remember that this is a limited 
view of only one aspect of the admissions 

30 Sackett et al., 2009. 

31 Geiser & Studley, 2002. 

32 Crede & Kuncel, 2008. 


decision. Admissions policy makers also con¬ 
sider other ways in which prospective stu¬ 
dents may contribute to the university. 
These vary from maintaining family ties 
to an institution (the “legacies” that pro¬ 
duce endowments on which some univer¬ 
sities rely) to a student’s athletic abilities. 
Policy makers are also influenced by a desire 
to balance male-female ratios and to pro¬ 
mote racial and ethnic diversity in the stu¬ 
dent body. Considering the appropriate role 
of these goals in student admissions would 
take us far beyond a discussion of the value 
of intelligence in education. 

10.2.5. Post-Graduate Education 

In 1837 a twenty-three-year-old named 
Edward Cree received a license to practice as 
an apothecary from the University of Edin¬ 
burgh. Having a desire to go to sea, he was 
appointed a Surgeon in the Royal Navy. 
Ten years later he took some time off from 
the Navy to complete his M.D. 33 The same 
sort of thing happened in the United States. 
The Greenfield Village “living museum,” part 
of the Henry Ford Museum, contains an 
account of a mid-nineteenth-century physi¬ 
cian who “studied medicine awhile" at the 
University of Michigan and Case-Western 
Reserve University, decided he had learned 
all about medicine that he needed to know, 
and set up his practice on the northwestern 
frontier. 

Today we are a bit more formal. We 
demand completion of programs for entry 
into many professions. No credit is given 
for attendance. The rewards for completion 
can be considerable. Just as there is a 50% 
increase in income for going from the High 
School degree to the Bachelor's, there is 
another 50% by going from the Bachelor’s 
to the Doctorate (Figure 10.4). 

A variety of cognitive tests are used as 
screening examinations for post-graduate 
educational programs. It is difficult to say 
anything comprehensive about their valid¬ 
ity, for the importance of grades in gradu¬ 
ate education varies tremendously with the 

33 Cree, 1982. 



328 


HUMAN INTELLIGENCE 


Table 10.2. Odds ratios comparing probability of graduation for the top and bottom 
halves of admission test scores in an entering population of post-graduate students 


Test 

Typical Manner of Use 

Odds Ratio 

Graduate Record Examination 

Entrance into Ph.D. programs in 
many fields of study 

2.3:1 

Miller Analogies Test 

Entrance into Ph.D. programs in 
many fields of study 

2.2:1 

Law School Admissions Test 

Entrance into Law School 

1.4:1 

Graduate Management Admissions Test 

Entrance into MBA programs in 
business and management 

1.6:1 

Medical College Admissions Test 

Entrance into Medical School 

1.7:1 


Source: Data excerpted from the supporting online material for Kuncel & Hezlett, 2007, Table si. 
Reprinted with permission from AAAS. 


program. It is my impression [but no more 
than that] that grades are taken fairly seri¬ 
ously in professional programs such as Law, 
Business, and Medicine, and are regarded as 
incidental to research participation in most 
science programs. A validity measure that 
does not require equating grades across pro¬ 
grams is the accuracy with which high scores 
predict program completion. This is mea¬ 
sured by the odds ratio for program comple¬ 
tion, which is defined as 

Completion rate for 
students whose entrance 

, „ scores are in the top half 

Odds Ratio =-----. 

Completion rate for 

students whose entrance 
scores are in the bottom 
half 

Table 10.2 shows the odds ratios for a vari¬ 
ety of entrance examinations used as part 
of the screening examinations for entry into 
various graduate schools. The odds ratios 
vary from a low of 1.4 (for the Law School 
examination] to a high of 2.3 (Graduate 
Record Examination]. Having a test score in 
the top half of applicants is associated with 
at least a 40% improvement in probability of 
graduation, compared to test scorers in the 
bottom half. 


Although having a post-graduate degree 
clearly pays off, completing a post-graduate 
course often entails considerable financial 
and personal sacrifice in the short term. The 
information in Table 10.2 is of as much use 
to an accepted applicant, trying to decide 
whether to enter graduate school, as it is to 
an admissions officer. 


10.3. The Workplace 

Do tests of intelligence predict performance 
in the workplace? Here is a claim by three 
industrial-organizational psychologists. 

Many lay people, as well as social scien¬ 
tists, subscribe to the belief that the abili¬ 
ties required for success in the real world 
differ substantially from what is needed to 
achieve success in the classroom. Yet, this 
belief is not empirically or theoretically sup¬ 
ported. A century of scientific research has 
shown that general cognitive ability, or g, 
predicts a broad spectrum of important life 
outcomes, behaviors, and performances. 

Kuncel, Hezlett &£ Ones, 200 4, p. 148 

Putting Kuncel and colleagues' proposition 
more argumentatively, this is a case where 
the public (and many social scientists] have 
made up their mind, so please do not con¬ 
fuse them with facts. 
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Linda Gottfredson is ready to plunge 
ahead, whether she is believed or not: 

In no realm of life g is all that matters, but 
neither does it seem irrelevant in any. In 
the vast toolkit of human abilities, none has 
been found as broadly useful - as general - 
as g. 

Gottfredson, 2002, p. 332 

Gottfredson is right, but Kuncel and col¬ 
leagues are right to be concerned that the 
facts are a hard sell. Some of the reasons why 
are captured in a third quote, this time by 
J. Raven, the son of the J. C. Raven who 
developed progressive matrix testing and 
himself a prolific researcher on intelligence. 
Given his pedigree, one might expect J. 
Raven to take Gottfredson’s position, but 
he is rather hesitant. 

In the workplace and in the educational 
system numerous other qualities are impor¬ 
tant but remain invisible if one utilizes only 
tools developed within the traditional mea¬ 
surement paradigm, focuses mainly on con¬ 
ventional criteria of job performance, and 
accepts assumptions about the functional¬ 
ity of hierarchical organization of work¬ 
places and society. 

J. Raven, 2008c, p. 43 2 

J. Raven further argues that the impor¬ 
tant things determining job performance 
are not general cognitive power, but rather 
the specific skills and the motivation that a 
person brings to work. He also points out 
that evaluations of both job performance 
and academic success take place in con¬ 
strained situations. The constraints of the 
situation may be just as important as cog¬ 
nitive capabilities in determining behavior. 
Constraints on job performance vary widely 
across the workplace, while academic con¬ 
straints are more uniform. This argument is 
worth developing. 

In the academic setting there is a reason¬ 
ably clear-cut criterion for success - how 
well does a student know the material stated 
in the curriculum? When implicit objectives 
like “teaching a student how to think” are 
introduced, agreement over the criteria for 
success vanishes. 


10.3.1. Some Evidence from Studies 
of Military Enlisted Performance 

In the 1980s the United States Department 
of Defense conducted extensive studies of 
the prediction and assessment of the job 
performance of enlisted personnel. 34 The 
predictive measurements taken included 
cognitive and personality tests and bio¬ 
graphical statements of interests. Occupa¬ 
tional assessments were similarly varied. 
They included examination of service record 
books (which contain job performance rat¬ 
ings) and records of promotions, commen¬ 
dations, and disciplinary actions. In addi¬ 
tion, both pencil-and-paper and hands-on 
performance tests were given. Examinees 
had to demonstrate their general skills and 
knowledge as soldiers, sailors, marines, or 
airmen and their proficiency in their specific 
occupations. The occupations chosen var¬ 
ied from strictly military positions, such as 
infantrymen and artillerymen, to jobs with 
exact counterparts in the civilian world, 
such as automobile mechanics, clerks, and 
cooks. 

Five dimensions of job performance were 
identified. Two, general military proficiency 
and technical proficiency in one’s specialty, 
were “can do” measures. They evaluated 
how well a person could do his or her job, 
when they knew that they were being eval¬ 
uated. The next three factors were “will do” 
measures. Discipline referred to whether or 
not the individual followed regulations and 
could be relied upon to be ready to do his 
or her job. Leadership referred to the ability 
to encourage others and to take initiative. 
Fitness referred to personal bearing, appear¬ 
ance, and physical fitness. With the possible 
exception of fitness these dimensions apply 
to both the military and civilian workplaces. 

Figure 10.5 shows the relation between 
the five factors and measures of personality, 
biographical interests, and cognitive perfor¬ 
mance (including scores derived from the 
ASVAB). The cognitive measures were the 
best predictors, by far, of the two “can do” 

34 Campbell & Knapp, 2001. 
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Technical proficiency 
General soldiering 
Leadership 
Discipline 

Fitness 



0 0.2 0.4 0.6 0.8 


Predictive Validity 


□ Vocational interest 

□ Temperament/personality 
■ Cognitive ability 


Figure 10.5. Correlations between predictors and criterion 
measures in the U.S. Army study of enlisted performance. Data 
from McHenry et al., 1990, Table 4. 


factors. Interest and personality measures 
were the best predictors of the “will do” 
aspects of job performance. 

Steven Hunt, an industrial and organi¬ 
zational psychologist, has pointed out that 
the first two steps in developing an assess¬ 
ment program in industry are to define the 
job that you expect employees to do and 
to determine how you are going to decide 
whether their performance measures up to 
these expectations. 35 It is not reasonable to 
expect anyone to excel in all aspects of per¬ 
formance. To the extent that the required 
job skills are themselves not correlated, it 
is impossible for one predictor to predict 
them all. The results shown in Figure 10.5 
illustrate Steven Hunt's point. “Can do” is 
useless without “will do.” 

Job performance is highly dependent 
upon experience, because the more one 
practices something, the better one becomes 
at it. Expertise in complex tasks can take 
years to acquire. 36 Expertise implies the abil¬ 
ity to learn from experience. 

Further military studies showed that job 
performance was a joint function of expe¬ 
rience on the job and intelligence. Soldiers 
at all intelligence levels took about eighteen 
months to approach their top levels of per¬ 
formance, with much slower improvement 
in the next two years. Soldiers with the high¬ 
est cognitive scores (AFQT Level I and II) 

35 S. Hunt, 2007, Chapter 5. 

36 Ericsson, 2003. 


performed better after six months on the job 
than soldiers with lower scores after forty- 
two months. 37 

The military provides a highly structured 
workplace, and the workforce is younger 
than the civilian workforce. What are the 
relationships between intelligence and per¬ 
formance in the civilian workplace? 

10.3.2. Evidence from the Civilian 
Workplace 

Literally hundreds of studies have been 
done of the relation between test scores 
and job performance in the civilian sec¬ 
tor, using tests ranging from the ASVAB, 
which takes several hours, to the Won- 
derlic and the Raven tests. Two American 
industrial-organizational psychologists, John 
Hunter and Frank Schmidt, have conducted 
a number of widely cited meta-analyses of 
these results. Figure 10.6, taken from one 
of their best-known studies, 38 shows that in 
the blue-collar, clerical, and administrative 
occupations the predictive validity of gen¬ 
eral intelligence, averaged over all studies, is 
.51 (corrected for range restriction and unre¬ 
liability in the job performance criterion). 
The validity coefficient can be increased 
by combining a measure of general mental 
ability with various other assessment meth¬ 
ods. Predictive ability can be raised to its 

37 Wigdor 81 Green, 1991. 

38 Schmidt & Hunter, 1998. 
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Job tryout 
Job knowledge 
Reference check 
Peer rating 
Unstructured interview 
Conscientiousness test 
Integrity test 
Work sample 
General intelligence 



m Multiple R 
i Correlation 


Correlation corrected for range restriction and criterion 
unreliability 


Figure 10.6. The correlations between measures of job 
performance, measures of general intelligence, and a variety 
of other assessment measures. The values shown are for the 
correlation between job performance and the assessment 
(Correlation) and the correlation between job performance and 
an optimum weighting of the assessment and the assessment of 
general intelligence (Multiple R). Data from Schmidt & Hunter, 
1998, Table 1. 


maximum validity, .65, by combining a mea¬ 
sure of general intelligence with a test of 
integrity. (Conscientiousness is a close sec¬ 
ond, at .60.) This illustrates the combined 
importance of “can do" and “will do” traits. 

The correspondence between the mili¬ 
tary and civilian data shows that the findings 
are robust over different situations and dif¬ 
ferent methods of evaluation. The military 
data was gathered by direct observation of 
young adults; the civilian figures were based 
on a meta-analysis of dozens of small studies, 
covering all age ranges, but none as compre¬ 
hensive or rigorous as the military studies. 

The fact is clear. General intelligence has 
a predictive validity of about .50 in the 
workplace, just as it does in academia. No 
other method of assessment does any better. 
Nevertheless, people persist in using other 
techniques for predicting workplace perfor¬ 
mance. An examination of these alternatives 
is in order. 

The only type of test with predictive 
ability greater than a test of general intel¬ 
ligence is a work sample (correlation of 
.54 compared to .51). This can be used in 
some situations. For instance, when musi¬ 
cians audition for places in major symphony 
orchestras they are often asked to play their 
instruments behind a curtain, so that the 
judges do not know who the candidate is. 


Work samples have two highly desirable 
qualities: they are statistically valid, and they 
are easily justified when assessment meth¬ 
ods are challenged. Their drawbacks are that 
they can be rather expensive and that they 
can be used only if the candidates for a job 
have already been trained to do the job. 

Combining a work sample and a gen¬ 
eral intelligence measure increases predic¬ 
tive validity to .63. The increase is not 
surprising, for by combining a general intel¬ 
ligence measure with a work sample the 
employer is simultaneously informed about 
the prospective employee's general reason¬ 
ing powers and specific job knowledge. 

In personnel selection situations a test 
this accurate, combined with a high rejec¬ 
tion rate, can greatly increase the quality of 
the employed workforce. Recall that if no 
screening test is used, the average person 
hired should have an ability level equal to 
the fiftieth percentile (median) of the appli¬ 
cant population, regardless of the rejection 
rate. If a predictive validity of .63 is com¬ 
bined with a rejection rate of 50% (half 
the applicants are hired), the average ability 
level of a person hired will be at the sixty- 
ninth percentile of the applicant population. 

An unstructured interview is an interview 
in which the recruiter and the candidate 
“just chat,” so that the recruiter can get a 
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feel for the candidate. This is probably the 
most widely used selection procedure. The 
unstructured interview is not very good on 
its own (r = .38, corrected) and adds very 
little to the information gained in a test of 
general intelligence. 

A structured interview (not shown in 
the figure, but included in Schmidt and 
Hunter’s analyses) is an interview in which 
the recruiter has decided, beforehand, what 
topics are to be discussed in the interview, 
and what information must be provided. 
The technique requires a careful analysis 
of the requirements of the position to be 
filled, before searching for candidates. Struc¬ 
tured interviews have good predictive valid¬ 
ity, both on their own and when combined 
with a test of general intelligence (r = .51, 
R = .63). 

Job knowledge is usually assessed by per¬ 
formance on a written test, where the ques¬ 
tions are chosen to reflect what a job holder 
should know. This is a face-valid measure; 
we can reasonably expect bus drivers to 
know the rules of the road, and firefight¬ 
ers to know how to use various pieces of 
equipment. Job knowledge is not quite as 
good a predictor as is general intelligence, 
but it does add to predictive validity beyond 
that provided by a general intelligence score 
(r = .48, R = .58). In terms of the Gf-Gc 
model of intelligence, what a job knowledge 
test does is assess what the applicant knows 
about the particular situation in which he 
or she will be working. The same idea is 
captured by Robert Sternberg’s emphasis on 
practical intelligence, which would include 
job knowledge. Sternberg and his col¬ 
leagues have provided such tests, and they 
have on occasion shown some incremental 
validity. 39 

The practical intelligence tests Sternberg 
and his colleagues have described are very 
close to job knowledge tests. For instance, 
one practical intelligence test, designed for 
Alaskan hunters, asked what different pieces 
of evidence mean as indicators of coming 

39 See Sternberg, 2003, for a general discussion, and 

Sternberg et al., 2000, for a compendium of many 

of the studies. 


weather. 40 Such questions measure crystal¬ 
lized intelligence (Gc) within a specialized 
context. 41 

10.3.3. Upper-Level Managerial and 
Professional Positions 

The data presented so far is based largely 
on data from studies of blue-collar and 
white-collar jobs, up to the lower manage¬ 
rial level. In this population of occupations 
the correlation between general intelligence 
test scores and job performance generally 
rises with increasing job complexity. 42 Given 
this fact, it would be reasonable to expect 
the correlation to be still higher for high- 
level managerial, executive, and professional 
positions. However, there are reasons not 
to assume a straightforward extrapolation 
of the results to the managerial/professional 
class. 

Many studies of high-level occupations 
report the observed correlation between test 
scores and measures of job performance, 
but cannot correct for selection restriction 
because there is no data on the applicant 
population. This is serious, because the 
selection effects are likely to be large. High- 
level positions are quite competitive, and 
are virtually always filled by people in the 
upper quartile of the intelligence range, IQ 
110 and above. It is also difficult to find a 
measure of how well a professional or exec¬ 
utive is doing, beyond gross judgments of 
satisfactory or unsatisfactory performance. 
As Fortune magazine repeatedly shows in its 
annual survey of executive salaries, the cor¬ 
relations between executive compensation 
and objective measures of company perfor¬ 
mance are close to zero. Physicians, attor¬ 
neys, and other professionals are evaluated 
periodically, but the ratings are often lim¬ 
ited to certification of competence without 
any further differentiation. 

It is also often hard to acquire the 
required data on intelligence. People who 

40 Grigorenko et al., 2004. 

41 See Gottfredson, 2003a, and Hunt, 2008, for expan¬ 
sions on this point. 

42 Gottfredson 1997, 2002. 
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occupy high-level positions are busy, and 
usually see no need to have their cognitive 
skills evaluated. As a substitute for direct 
observation, many studies look at profes¬ 
sional training rather than on-the-job per¬ 
formance. As was shown in the section on 
post-graduate education, cognitive tests do 
surprisingly well in predicting completion of 
professional and managerial training. 

There is also the problem of the multidi¬ 
mensionality of the criterion. People who 
occupy high-level positions are typically 
asked to do a number of different tasks. 
These range from high-level planning to 
public relations, face-to-face leadership, and 
negotiations. The relative importance of dif¬ 
ferent tasks varies greatly across occupations 
and even from time to time within an occu¬ 
pation. It is not surprising that there has 
been a good deal of resistance to the idea 
that any unidimensional measure could pre¬ 
dict performance at high levels. This has led 
to three different approaches. 

In order to evaluate highly intelligent 
people using the conventional psychomet¬ 
ric paradigm we have to have harder tests. 
Examples are the advanced version of the 
Raven tests, the Raven Advanced Progres¬ 
sive Matrices (RAMP), and the Miller Anal¬ 
ogy Test (MAT), which contains difficult 
verbal analogy problems. These tests do pre¬ 
dict job performance, with observed corre¬ 
lations in the .15-. 30 range, depending upon 
the criterion used to evaluate performance, 
and with a predictive ability on the order of 
.40. 45 This is somewhat below the level of 
prediction obtained for higher-level skilled 
work, but still a reasonable figure. 

It is somewhat more enlightening to look 
at a single large prospective study. During 
the 1960s the Bell Telephone System, at the 
time a near-monopoly covering telephone 
services in the United States, used the assess¬ 
ment center technique to select beginning 
managers. Management trainees spent sev¬ 
eral days in an assessment center, where they 
were rated for their ability to solve com¬ 
plicated problems both individually and in 

43 Kuncel, Hezlett, & Ones, 2004; Raven, 2008b. 


groups, and also given a cognitive test sim¬ 
ilar to the SAT reasoning tests, along with 
several personality tests. The results of these 
assessments were carefully shielded from 
their superiors in the company, so that the 
test scores could not influence supervisors’ 
judgments or hiring decisions. (By contrast, 
in the US military a service person’s scores 
on entry tests are part of his or her service 
record, and hence are available to comman¬ 
ders and promotion boards.) Twenty years 
later the assessment center results were val¬ 
idated by determining whether they pre¬ 
dicted the level of management the can¬ 
didate had achieved. The cognitive test 
(r = .38) was by far the best predictor. 44 
As would be suggested by Schmidt and 
Hunter’s analyses, personality tests had 
lower validity than cognitive tests, but did 
add substantially to predictivity. 

Because high-level performance is said 
to be so multidimensional, some interest¬ 
ing alternatives to the conventional testing 
methods have been developed. One of the 
more popular of these is the situational judg¬ 
ment test. In this test an examinee is asked 
what he or she would do in a realistic, dif¬ 
ficult situation. An example I particularly 
like, and that has appeared in a number 
of guises, is asking an applicant for a mid¬ 
dle management position how they would 
inform their own supervisor that the super¬ 
visor’s pet project was not working. As the 
example illustrates, an attempt is made to 
design situational judgment tests that draw 
on both cognitive skill narrowly defined and 
the examinee’s social skills. Situational judg¬ 
ment tests add an additional .06 to the pre¬ 
dictive validity that can be achieved by a 
cognitive test alone - not a large amount, 
but enough to be worthwhile in a large scale 
assessment program. 45 It is worth noting, 
though, that a situational judgment test asks 
the examinee what he or she would do in a 
hypothetical situation. It does not immedi¬ 
ately follow that that is what the examinee 
would do, if placed in an actual, possibly 
emotional situation. 

44 Howard & Bray, 1988. 

45 McDaniel et al., 2001. 
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To summarize, general cognitive ability 
is the best single predictor of executive/ 
professional-level performance, just as it 
is of performance in the middle to high- 
end range of the general workforce. Predic¬ 
tion of executive/professional performance 
is somewhat less accurate than prediction of 
general workplace performance. There are 
several reasons why this might be so. They 
include difficulties in defining and obtaining 
measures of job performance, the extreme 
restriction in range of intelligence among 
applicants for high-level positions, and, pos¬ 
sibly, the fact that general cognitive abil¬ 
ity is a less dominating factor, compared 
to other dimensions of intelligence, in the 
upper ranges of cognitive competence than 
in the lower (see Chapter 4). Nevertheless, it 
is reassuring to know that among the movers 
and shakers in our society intelligence does 
count. 

10 . 3 . 4 . Th e Rewards for Cognitive Skills 
in the Workplace 

The previous sections have shown that there 
is a positive relation between intelligence 
and workplace performance, within both 
military and civilian occupations. This is the 
sort of information employers want to have. 
From the viewpoint of an individual enter¬ 
ing the workforce, the question is rather dif¬ 
ferent. The individual wants to know what 
sort of economic niche he or she is likely 
to occupy, given a certain level of intelli¬ 
gence. As an ancillary question, what sort of 
rewards can the intelligent person look for¬ 
ward to, in terms of either money or occu¬ 
pational prestige? 

Determining the statistical relationship 
between rewards and prestige is straightfor¬ 
ward. You look at the correlation between 
test scores and some index of rewards. This 
can be done on an individual basis, or, 
as is sometimes easier to do, researchers 
can determine the typical level of intelli¬ 
gence of people in different occupations, 
and then look at the prestige and economic 
rewards offered by those occupations. How¬ 
ever, as always, correlation does not neces¬ 
sarily mean causation. 


In Chapter 1 I introduced the challenge 
hypothesis, the idea that within genetically 
prescribed limits people will increase their 
intelligence in response to a cognitively chal¬ 
lenging environment. It has been shown, for 
instance, that there are qualitative differ¬ 
ences in the reasoning skills of psycholo¬ 
gists, physical scientists, and lawyers. Psy¬ 
chologists, who receive substantial training 
in statistics, are more sensitive to arguments 
based on probabilities than are people in the 
other two fields. 46 Were the psychologists, 
physical scientists, and lawyers attracted to 
their fields because the demands of the field 
matched their preferred styles of reason¬ 
ing, or were the styles of reasoning deter¬ 
mined by their experiences? The best way 
to answer this question is by a prospective 
study, where a person’s intelligence is deter¬ 
mined before he or she enters the workforce, 
and then related to the person’s subsequent 
work history. 

During World War II the United States 
Army Air Force (USAAF, the predecessor 
of today's Air Force, USAF) tested large 
numbers of young men who had applied to 
serve as aviation officers. 47 Two Columbia 
University psychologists, Robert Thorndike 
and Elizabeth Hagen, located approximately 
10,000 of the men about twelve years after 
they had been tested. 48 At that time most of 
the men were in their early to mid-thirties. 

Table 10.3 shows the mean test scores on 
general reasoning, verbal, and perceptual- 
motor scales of the original test, for men in 
selected occupations. These estimates were 
then converted to IQ ranges. The cadets who 
eventually entered those occupations that 
we consider more generally intellectually 
challenging, or that require a considerable 
amount of education, were also the cadets 
who had, at age twenty-one, scored high 
on the general reasoning tests. As we scan 
down the general reasoning scale we begin to 
encounter white-collar office jobs, and then 
various blue-collar jobs that, although they 

46 Amsel, Langer, & Loutzenhiser, 1991. 

47 Women were not accepted for aviation cadet train¬ 
ing. Some women who had already qualified as avi¬ 
ators in the civilian sector did serve. 

48 Thorndike & Hagen, 1959. 
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Table 10.3. Cognitive skills assessed in USAAF aviation cadets, shown by occupation 
followed after WW II. The far left-hand column shows estimated general intelligence 
measures using the conventional IQ scale. The estimate is based on a conversion of the 
general reasoning score, assuming that o on that scale corresponds to 105 on the IQ scale. 
Right-hand columns show mean scores achieved at approximately age twenty-one on three 
different composites of a testing battery. All three scales have a mean of o and a standard 
deviation of 100 in the sample of cadets. 


Estimated IQ Range 

Occupation 

General Reasoning 

Numerical 

Visual-Perceptual 

>115 

Chemical engineer 

106 

42 

3 ° 


Mechanical engineer 

93 

34 

44 


Physical scientist 

80 

22 

2 3 


College professor 

75 

38 

38 


Civil engineer 

75 

3 1 

5 6 


Electrical engineer 

65 

6 

9 

uo< - <115 

Physician 

59 

20 

18 


T reasurer/comptroller 

55 

96 

2 3 


Industrial engineer 

44 

3 1 

34 


Lawyer 

39 

22 

-7 


Personnel manager 

33 

18 

*3 


Pharmacist 

2 9 

39 

“9 

0 

V 

1 

VI 

ITS 

O 

Dentist 

28 

20 

>5 


Accountant/auditor 

28 

54 

-4 


Optometrist 

H 

34 

“4 


Clergyman 

*3 

1 

~ l 7 


Airplane pilot 

13 

10 

—1 


Real estate salesman 

6 

17 

6 

ioo< - <105 

Office manager 

4 

33 

9 


Insurance underwriter 

3 

2 

“"9 


Veterinarian 

-8 

—2 

—20 


Insurance claims adjuster 

-13 

-5 

“9 


Bricklayer 

-24 

“5 

-38 


Radio/TV repairman 

-33 

-37 

21 

95-<-<ioo 

Hardware Salesman 

-36 

—12 

-9 


Sales clerk 

-40 

—22 

-28 


Plumber 

~4 2 

—21 

“ 3 1 


Carpenter 

-44 


-4 


Police detective 

-50 

—26 

—20 


House painter 

-63 

—12 

-24 

<95 

Crane operator 

-66 

00 

1 

“37 


Vehicle mechanic 

~ 7 2 

-65 

-7 


Assembler (in factories) 

-s. 

-76 

-40 


Source: Data selected from Thorndike and Hagen, 1959. 


may require considerable skill, are less intel¬ 
lectually demanding and can be learned on 
the job rather than via formal education. 

Some pairwise comparisons are interest¬ 
ing. Cadets who became physicians had 
general reasoning skills similar to those of 
cadets who became treasurers/comptrollers 


(i.e., high-level financial managers), but the 
treasurer/comptrollers had higher numeri¬ 
cal skills than the physicians. Cadets who 
became college professors had the same gen¬ 
eral reasoning skills as those who became 
civil engineers, but the engineers had higher 
perceptual skills. This illustrates the point 
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Table 10.4. Correlations between AFQT scores obtained at 
age sixteen to eighteen and measures of social outcomes 
when the participants were in their early thirties 


Demographic 

Group 

Educational 

Attainment 

Square Root 
Income 

Occupational 
Status Index 

White men 

.67 

•39 

•54 

Black men 

.48 

.29 

•45 

White women 

•59 

• 3 1 

■45 

Black women 

•53 

-44 

•47 


Source: Data from Scullin et al., 2000, Table 2. 


that in many occupations general reasoning 
skills have to be augmented by more specific 
cognitive skills. 

Other pairwise comparisons show that 
occupations can be the same in terms of the 
stress they put on different special abilities, 
but differ in the level of associated general 
reasoning skills. Treasurers and comptrollers 
(high-level financial officers) and accoun¬ 
tants and auditors (financial technicians) 
were both characterized by high reasoning 
and numerical skills, but the cadets who 
became treasurer/comptrollers had higher 
levels of these skills. 

While the Thorndike and Hagen study is 
certainly informative, there are some aspects 
that limit sweeping conclusions. The partic¬ 
ipants, aviation officers, were an intellectu¬ 
ally select group, with an estimated mean 
IQ of 105. The occupations these young 
officers entered, following the war, were 
fairly high on the occupational scale. And 
the test battery was designed for aviation 
cadets. Accordingly, compared to the typi¬ 
cal battery-type IQ test the aviation battery 
was biased toward the evaluation of visual- 
perceptual skills - so much so that verbal 
skills were incorporated within the general 
reasoning factor. Even given these limits, 
a coherent picture emerged: intelligence is 
worth quite a bit. 

Thorndike and Hagen studied the work¬ 
place of the mid twentieth century. Com¬ 
pared to that workplace, today's workplace 
places much more emphasis on the manip¬ 
ulation of data and abstract representations, 
rather than things. 49 Have these changes in 

49 Hunt, 1995; Reich, 1991; Zuboff, 1988. 


the workplace changed the requirements for 
cognitive skills? 

To address that question we look at a 
second prospective study using the NLSY79 
data (see panel 9.9). Recall that in this study 
a nationally representative sample of adoles¬ 
cents and young adults took the AFQT. In 
2000 a research group at Cornell University 
investigated the occupational and income 
status, as of 1995-96, for over 2,400 of the 
panelists who had been born in 1963-64. 50 
The men and women in this study were of 
approximately the same age as the former 
aviation cadets studied by Thorndike and 
Hagen, but in a cohort born roughly forty 
years later. 

In general, the Cornell group found 
that there was a moderate positive cor¬ 
relation between AFQT score, educational 
attainment, income, and occupational pres¬ 
tige (SEI). The correlations are shown in 
Table 10.4, reported separately for four 
different demographic groups, White and 
Black men and women. Although the cor¬ 
relations are all positive, and never negli¬ 
gible, they do vary markedly across demo¬ 
graphic groups. As a rough generalization, 
intelligence seems to be a more useful pre¬ 
dictor of future success if you are a White 
man or a Black woman than if you are a 
White woman or a Black man. 

Table 10.4 treats educational attainment 
as an outcome. Education can also be 
thought of as something to be achieved 
en route to further social and economic 
success, rather than as an end in itself. 

50 Scullin et al., 2000. 
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Figure 10.7. Predicting square root income (SQRT INCOME) and 
occupational prestige (SEI) in 1995 from educational attainment (in 
years) and AFQT score obtained in 1980. Respondents were sixteen 
to eighteen years old in 1980. Codes: WM - White men, BM - Black 
men, WW - White women, BW - Black women. Data for 
calculations from Scullin et al., 2000, Table 2. 


Not surprisingly, educational attainment 
and AFQT score are substantially correlated 
in this sample, ranging from .67 (White 
men) to .48 (Black men). This is consis¬ 
tent with the data reported in section 10.2, 
relating intelligence to academic achieve¬ 
ment. Figure 10.7 shows that there is little 
added predictive value of knowing a per¬ 
son’s AFQT score, once his or her educa¬ 
tional attainment is known. 

There are three possible interpretations 
of this finding. One is that education is 
the proximal variable that determines socio¬ 
economic outcome and income, while intel¬ 
ligence acts as a distal variable, determin¬ 
ing education but then playing no further 
role. The other is that intelligence acts 
as a proximal variable, and that education 
serves as an additional statistical marker 
for intelligence beyond the AFQT score. 
These explanations can be discriminated by 
contrasting the partial correlations between 
AFQT and the outcome variable, occupa¬ 
tional prestige or income, holding educa¬ 
tion constant, or between education and the 
outcome variable, holding AFQT score con¬ 
stant. I calculated these and found that for 
income the partial correlations for education 
are generally larger than for AFQT score, 
but that the differences are not great. The 
results for occupational prestige are strik¬ 
ing. The partial correlations for education 


given AFQT score are .43 for Blacks (both 
men and women), and .38 and .28 for White 
men and women, respectively. The corre¬ 
sponding values for AFQT given education 
range from .16 (White men) to .20 (White 
women). Evidently education and intelli¬ 
gence are collinear predictors of income. 
Intelligence acts as a distal variable, exerting 
its influence through education, which then 
permits entry into prestigious occupations. 
The educational effect seems to be stronger 
for Blacks than for Whites. 

Another way to determine what the 
workplace is willing to pay for intelligence 
is to examine the test scores of people who 
apply for jobs in various occupations. This 
carries with it the defensible assumption 
that the applicants exert self-selection. Peo¬ 
ple without college degrees generally do not 
apply for entry-level executive positions. 

Gottfredson used a number of stud¬ 
ies of intelligence test scores in job appli¬ 
cants to construct a “life’s chances” chart 
for various occupations. 51 She associated IQ 
equivalents with five classes of occupation, 
ranging from what she regarded as “slow, 
supervised” work to "gathers own infor¬ 
mation." Table 10.5 lists occupations cited 
by Gottfredson, along with her estimates 
of the typical IQ score for an applicant. 

51 Gottfredson, 1997. 
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Table 10.5. Gottfredson's examples of intelligence levels (on IQ scale) associated with 
different occupations and techniques of information processing. The columns to the 
right of Gottfredson's figures show the intelligence-level estimates obtained from 
Thorndike and Hagen's study of the careers of USAAF aviation cadets, approximately 
fifty years earlier (Table 10.4) and the range of Wonderlic scores of applicants reported 
in the Wonderlic Corporation's Normative Report (April 2007) for the WPT-R, revised 
in 2003. As occupations used in the norming for the WPT and WPT-R, a comparable 
profession to Gottfredson's "typical profession" was used. Wonderlic scores have been 
converted to IQ units using the conversion 2 * Wonderlic score + 60 (Dodrill, 1981; 
Dodrill & Warner, 1988). 


Training and 
Qualification 

Method 

Typical 

Position 

Gottfredson’s 

Estimated 

Typical IQ Score 

Estimates Based 
on Thorndike 
Hagen Data 

WONDERLIC 
with Comparison 
Occupation 

Explicit, hands-on 
training 

Assembler 

00 

0 

<95 

88-104 Electro¬ 
mechanical 
assembler 

Mastery learning, 
hands-on training 

Police officer 

95 -1 °5 

95-100 

100-114 Police and 
sheriff officer 

College formal 
instruction 

Accountant 

110-120 

105-110 

102—120 

Accountant 

Graduate 

Attorney 

120 + 

110-115 

110-124 Executive 


instruction, 
gathers own 
information 


The table also includes data for comparable 
occupations, based on the Thorndike and 
Hagen data, taken forty years earlier, and 
for the 2003 revision of the WONDER¬ 
LIC test, the WPT-R. There is a striking 
similarity between Gottfredson's estimates, 
the WONDERLIC estimates, and the esti¬ 
mates that Thorndike and Hagen had made 
forty years earlier. The data was gathered 
using different sampling methods, in dif¬ 
ferent workplaces separated by over half 
a century, and the tests used were quite 
different. 52 Nevertheless, the estimates are 
basically the same. Comparing Tables 10.4 
and 10.5, it appears that the role of intelli¬ 
gence in today’s workplace is very much the 

52 Thorndike and Hagen’s general reasoning factor was 
extracted from a battery of subtests that took several 
hours to complete. Gottfredson’s estimates were 
based on data from the Wonderlic Personnel Test 
(WPT) The current Wonderlic estimates are based 
on the revised version, WPT-R, with data from 2003. 
See Chapter 2 for a description of the WPT. 


same as it was before the term "information 
technology” was invented. 

A second retrospective study provides 
a more comprehensive look at general 
levels of occupational accomplishment, 
but does not provide data on individual 
occupations. 

In the late 1980s the Center for Dis¬ 
ease Control conducted a follow-up study 
of the health of people (primarily men) who 
had served in the military during the Viet¬ 
nam War era (1967-76). A large number of 
veterans were contacted and asked to par¬ 
ticipate in extensive physical and mental 
testing. Helmut Nyborg, a Danish psychol¬ 
ogist, and Arthur Jensen related the veter¬ 
ans’ current occupational status to their edu¬ 
cational attainment and to a measure of g 
extracted from their test scores, as of the 
late 1980s. 53 Figure 10.8 shows the results 

53 Nyborg & Jensen, 2001. 
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Income 



Occupational Status 


Together 


Black 


White 


0 0.1 0.2 0.3 0.4 0.5 



Correlation (r) or multiple correlation 


■ Together 

■ Ed 
■g 


■ Together 

■ Ed 
g 


Figure io.8. Correlations and multiple correlations between income 
(top] and occupational status (bottom] and general intelligence and 
educational level. The calculations are based on Table 3 of Nyborg & 
Jensen, 2001. 


of Nyborg and Jensen’s analysis. Data are 
shown separately for black and white veter¬ 
ans, as they differed on educational, intelli¬ 
gence, occupational, and income measures. 

In this sample the intelligence test score 
was a slightly better predictor of income 
than was educational achievement, while 
the reverse was true for the NLSY data. 
It would be unwise to make very much 
of this. The samples were different, the 
tests were different, and the NLSY analy¬ 
sis attempted to predict future accomplish¬ 
ment from measures taken in adolescence, 
while Nyborg and Jensen related current test 
performance to current accomplishment. In 
any case, the similarities are far greater than 
the differences. 

Both educational level and intelligence 
are substantial predictors of accomplish¬ 
ment in the workplace. The two are highly 


correlated. This is hardly surprising. Both 
measures are based on the development and 
display of cognitive skills. 

10 . 3 . 5 . What the Jobs Demand 

Another way of determining the value of 
intelligence in the workplace is to analyze 
the job requirements of a large number of 
occupations, and infer what this means in 
terms of the demands on intelligence. The 
first step is to make an analysis of the relative 
value of cognitive skills for different jobs. 

The US Department of Labor (DOL) 
maintains an elaborate job counseling ser¬ 
vice, in which it describes over 12,000 
jobs and rates the extent to which they 
require certain skills. The skills rated range 
from general reasoning ability to finger dex¬ 
terity. The rating system was originally 
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GATB Scale 


■ Physical Science 

□ Equipment Operation 

■ Law 

□ Child & Adult Care 


Figure 10.9. Recommended GATB cut points for four occupations, 
taken from high-status and lower-status patterns, within the class of 
occupations dealing with physical or with social relationships. 

GATB scales are G - general reasoning, V - verbal, N - numerical, 
and S - spatial. Data selected from Gottfredson, 1986, Table 2. In 
cases where no value is specified the DOT analysts had made the 
judgment that virtually anyone would have sufficient ability to do 
the job. For instance, there is no cut point for spatial reasoning 
required for the law, and no cut point for general reasoning required 
for heavy equipment operation. 


incorporated into a descriptive volume 
called the Dictionary of Occupational Titles 
(DOT). The DOT has been superseded 
by an on-line, interactive system called 
0 *NET. 0 *NET is a considerable expan¬ 
sion over the DOT, designed primarily to 
help job seekers. As a side benefit, it con¬ 
tains a massive amount of data available 
to researchers interested in issues involving 
workforce skills. 

Gottfredson utilized the original DOT 
system to construct a “space” of jobs. 54 Her 
analysis coordinated job ratings with data 
on the test performance of people who had 
applied for, or were occupying, a variety of 
jobs. She identified five classes of occupa¬ 
tions - those dealing with physical relations, 
social and economic relations, maintaining 
bureaucratic order, and performing. (She 
also had a small class of “leftover” occupa¬ 
tional patterns that will not be dealt with 
further.) Within each of these classes she 
identified the patterns of aptitudes required. 
For example, within the class of occupa¬ 
tions dealing with physical relations there 
was a cluster of occupations that dealt 

54 Gottfredson, 1986. 


with research and design (e.g., physicist, 
engineer) and a cluster that dealt with 
building, maintaining, or operating physical 
objects (e.g., equipment operators, crafts¬ 
men). Within the class dealing with social 
and economic systems one cluster dealt 
with research and design (including social 
research, law, and finance), while another 
dealt with providing service to individuals 
(including hospitality services, and child and 
adult care). 

Individual jobs within a cluster could 
be associated with a pattern of abilities, 
as defined by the DOL’s General Apti¬ 
tude Test Battery (GATB), which was in 
use at the time. The DOL used these val¬ 
ues to recommend minimum values (cut 
points) for a job along each of four GATB 
dimensions: general reasoning, verbal rea¬ 
soning, numerical skills, and spatial skills. 
Figure 10.9 provides examples for four occu¬ 
pations, two from Gottfredsons's class of 
occupations dealing with physical relations 
and two from the class dealing with social 
relations. Within each class one occupation 
was selected from Gottfredson’s high-status 
cluster, and the other from a lower-status 
cluster. Physical scientists and lawyers, 
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Figure 10.10. The ninetieth, fiftieth (median) and tenth decile of 
incomes in various occupations, plotted as a function of the imputed 
intelligence demands for the occupation. Data from US Census 
Bureau and from Hunt and Madhyastha's analysis of job demands. 


high-status occupations from the physi¬ 
cal relations and social relations clusters, 
have similar high-level patterns except that 
lawyers are not required to have spatial rea¬ 
soning skills, and do not have quite as high 
scores on other scales as the physical scien¬ 
tists do. Equipment operators have to have 
a certain minimal level of spatial reasoning 
skills, while personal caretakers have to have 
a minimal level of general reasoning skills. 

Gottfredson found that general intelli¬ 
gence was by far the biggest driver of varia¬ 
tions in cognitive skills across occupations. 
Verbal, numerical, and spatial skills were 
important in some occupations, but they 
accounted for much less of the variation 
in the descriptions of occupational require¬ 
ments than did differences in requirements 
for general reasoning. 

Tara Madhyastha and I have analyzed 
the modern 0 *NET data (as of 2008) 
using different techniques than Gottfred¬ 
son had used, but intended to answer basi¬ 
cally the same questions. In general, our 
analysis of cognitive demands agreed well 
with Gottfredson's. 55 We found that the rat¬ 
ings of skills could be described by a two- 
dimensional space, where one dimension 
was a general reasoning factor and the other 
was a bipolar factor, indicating whether the 
job emphasized verbal or perceptual-motor 

55 Unfortunately it is not possible to compare our 
analysis to the data obtained by Gottfredson or by 
Thorndike and Hagen, because our analysis was at 
a much finer level of detail. 


skills. We also found a smaller factor indi¬ 
cating the extent to which a job required 
numerical skills. The factors were not statis¬ 
tically independent, because most jobs that 
require a high degree of general reasoning 
also require fairly high verbal skills, just as 
Gottfredson had noticed. This is not surpris¬ 
ing; people who hold intellectually demand¬ 
ing jobs usually have to communicate with 
other people. In spite of comic strip stereo¬ 
types, real computer programmers spend a 
great deal of time describing what their pro¬ 
grams do, and how they fit into suites of 
programs developed by other people. 

Our analysis also allowed us to relate 
the incomes associated with different occu¬ 
pations to the cognitive demands of those 
occupations. The relations to general intelli¬ 
gence are shown in Figure 10.10. The income 
associated with occupations increased as the 
occupations demanded higher levels of intel¬ 
ligence. However, there appeared to be a 
nonlinear increase; incomes are fairly flat 
over occupations that require slightly more 
than average intelligence. Incomes are more 
closely related to demands on intelligence if 
the occupation requires an IQ of more than 
about 105 (g > .3 on the standard devi¬ 
ation scale). What is even more striking 
is the range of incomes associated with a 
given level of cognitive demand. This is 
clearly shown in the figure. The range in 
annual income between the 10% best-paid 
and 10% worst-paid occupations with a cog¬ 
nitive demand of 100 (o in standard deviation 
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units] was about $65,000 (2005 dollars). The 
range for jobs with a cognitive demand of 116 
[1 in standard deviation units) was roughly 
$140,000. 

Figure 10.10 also shows that the distribu¬ 
tion of income is markedly positively skewed 
at all levels of intelligence. The absolute dif¬ 
ference between the median and the nineti¬ 
eth percentile is always greater than the 
absolute difference between the median and 
the tenth percentile. The extent of the skew 
increases with increasing imputed intelli¬ 
gence, a trend that begins much sooner than 
the rise in median income as a function of 
imputed intelligence. 

10.3.6. Summary: The Role of Intelligence 
in the Workplace 

The quotations that began this section, 
from Gottfredson and from Kuncel and col¬ 
leagues, were accurate. Intelligence predicts 
a person’s job status and income better than 
any other trait that has been studied. This 
leaves us with two questions: why does the 
association exist, and why is it that the asso¬ 
ciation is so widely denied by people who 
have not studied the topic? 

The influence of intelligence is undoubt¬ 
edly mediated in part by education. This 
is especially true across fields, for the vast 
majority of the more lucrative occupa¬ 
tions have, as an entry requirement, at 
least a college education. The best-paid 
professional occupations require substan¬ 
tial graduate-level training. It is increasingly 
the case that the entry routes to skilled 
trades, such as auto mechanic, involve aca¬ 
demic certification through community col¬ 
lege or other professional training programs. 
Because educational attainment is strongly 
related to intelligence, any variable that is 
correlated with educational attainment will 
also correlate with intelligence. 

In the case of the professions education is 
essential; no one wants to have a surgeon (or 
an airline pilot) who is learning basic skills 
on the job. To the extent that intelligence 
is required to get through a rigorous train¬ 
ing program ; intelligence and education are 
entwined. 


Charles Murray has argued that in many 
cases education is not really needed for suc¬ 
cess in a field, but that education is used as 
a (socioeconomic?) screening device. 56 To 
the extent that this is true a spurious rela¬ 
tion between intelligence and economic suc¬ 
cess could be created, via a real relationship 
between intelligence and educational attain¬ 
ment and a spurious one between educa¬ 
tional attainment and economic success, cal¬ 
culated across occupations. However, this is 
not completely the case. Both intelligence 
and education are important, because there 
are nonzero partial correlations between 
indices of workplace success and either intel¬ 
ligence or education, after the other has 
been held constant. 

The fact that intelligence is correlated 
with on-the-job performance in a wide range 
of military and civilian occupations, includ¬ 
ing ones that do not have high educa¬ 
tional requirements, provides further evi¬ 
dence that intelligence is important in itself, 
not just as a facilitator of education. 

10.4. The Social and Economic 
Prospects at the Ends of the Bell Curve 

If intelligence is an important trait in our 
society, then the lifetime prospects of peo¬ 
ple on the two extremes of the distribution 
of intelligence should be very different. And 
they are. In this section we take a look at 
the careers of some people who are at the 
upper and lower ends of the intelligence 
distribution. 

It is important to be clear that we will 
not be looking at people who are conven¬ 
tionally labeled “geniuses” or at people who 
are mentally handicapped to the point that 
they cannot function in our society without 
special help. There are reasons for avoiding 
these extremes. 

The term “genius” is usually applied to 
people who have accomplished great things. 
The consensus of people who have stud¬ 
ied genius is that extraordinary accomplish¬ 
ment in any field requires some talent, but 

56 Murray, 2008. 
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also a great deal of motivation, and very- 
hard work. The social support network must 
be right. Howard Gardner makes this point 
in his excellent study of extreme creators, 
in such varied fields as physics (Einstein), 
writing (T. S. Eliot), politics (Gandhi), 
and art (Picasso). 57 Gardner's subjects were 
geniuses in any sense of the word. They 
were all very bright. They also had a single- 
minded sense of purpose and a social net¬ 
work of people who were willing to support 
their single-minded efforts. In addition, the 
times must be right. An unknown Sume¬ 
rian genius invented the wheel around 3500 
BCE. His or her invention spread rapidly 
through the ancient world, for it greatly 
improved the utility of oxen and horses. 
Perhaps 2,500 years later, and completely 
independently, an equally unknown Aztec 
invented wheels for children’s toys. The 
idea never went further, for the Aztecs 
had no beasts of burden. Cognitive traits 
undoubtedly are important in the creation 
of genius, so are noncognitive traits and fea¬ 
tures of the situation. Studying acknowl¬ 
edged geniuses is important in itself, 58 but 
it is not a good way to determine how high 
intelligence affects one’s progress through 
life. 

At the other end of the scale, there is little 
point in studying the lives of the extremely 
mentally disabled, who simply cannot cope 
with our society. Determining the sorts of 
social support these unfortunate individu¬ 
als require is an important topic, but one 
that is far beyond the scope of this book. 
What we can do is to examine the lives of 
people who fall in the "low normal” range, 
roughly IQ scores of 70-85. Most of these 
individuals are productive members of the 
society, but seldom maximally productive 
members. 

Once again, we have to balance the 
relative costs and benefits of retrospective 
and prospective studies. There have been 
numerous retrospective studies of the char¬ 
acteristics of high achievers, ranging from 


57 Gardner, 1993b. 

58 See, especially, Simonton, 1984. 


artists to politicians. 59 There have been 
even more studies of various low-achieving 
groups, such as welfare recipients and crim¬ 
inals. In both cases it is possible to find some 
traits that seem to characterize the target 
group. However, as is almost always the 
case with retrospective studies, it is hard to 
interpret these findings. For instance, eigh¬ 
teen of the first forty-four presidents of the 
United States received earned degrees from 
one or more of just five colleges. 60 Does this 
mean that if you attend one of these colleges 
you have a good chance of becoming pres¬ 
ident? Hardly. Only a miniscule fraction of 
the graduates of these colleges attained the 
presidency. 

At the other end of the social scale, it 
has been estimated that about one in every 
six homeless persons has either schizophre¬ 
nia or manic-depressive psychosis. Does 
this mean that a person who suffers from 
either of these diseases has roughly one 
chance in six of becoming homeless? Hardly. 
Approximately 3% of the US population, 
roughly nine million people, suffers from 
one of these two diseases. Some 200,000 are 
both homeless and are either schizophrenic 
or suffer from bipolar disorder. The chances 
are roughly one in forty-five, not one in 
six, of a mentally ill person becoming 
homeless. 

The best way to determine what happens 
to intellectually gifted or below-normal indi¬ 
viduals is to start with a group of (gifted) 
(below-normal) persons and follow that 
group through some portion of their lives, in 
a prospective manner, rather than attempt¬ 
ing a retrospective study of people who have 
had a particular life outcome. 

We will look first at a study of the gifted, 
and then examine the low-normal group. In 
each case my discussion will use as illustra¬ 
tion the results of one or two large studies. 

59 See, Simonton, 1984, and Gardner, 1993b 

60 The colleges are Harvard (six), the College of 
William and Mary (four), Yale (four), Princeton 
and the US Military Academy (two each). The list 
includes George Washington, who received a sur¬ 
veyor’s certificate from William and Mary. This was 
his only post-secondary education. I have assigned 
George W. Bush to Yale, where he received a B.A. 
He also received an MBA from Harvard. 
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Table 10.6. Percentages of men and women in the Terman study who attained various 
levels of education, compared to educational achievements of their cohort 


Men in Terman Women in General General Public, 

Group Study Terman Study Public, Men Women 

College graduates 70 67 10 6 

Graduate studies given 56 33 19 Unknown 

graduation from college 

Doctorates 14 4 22 

Source: Data from Terman & Oden, 1959. 


10 . 4 . 1 . The Gifted I: The Quiz Kids 
and the Termites 

During the 1940s and 1950s there was a pop¬ 
ular radio program called The Quiz Kids. A 
panel of very knowledgeable six- to sixteen- 
year-old children answered questions that 
required anything from an ability to do 
rapid mental calculations to knowing rather 
obscure scientific and historical facts. Their 
performance was impressive. Some were 
said to have IQs of 200, but that was 
apparently a score based on the old mental 
age/chronological age calculation. A more 
realistic estimate is that the IQ scores ranged 
in the 14^1604- range, roughly the top one 
in one thousand. 

About thirty years later one of the former 
Quiz Kids, who had become a professor, 
located a number of them to see how they 
were doing. 61 The commonest answer was 
generally quite well, thank you. An inordi¬ 
nate number of them had followed academic 
professions. The others were mostly in pro¬ 
fessional fields. One had received the Nobel 
Prize in Medicine and Physiology. 62 To be 
sure, not every one of the Quiz Kids had 
done well, and some had rather unhappy 
lives. But, on the whole, they were suc¬ 
cesses. 

The Quiz Kids hardly represents a sci¬ 
entific study. Candidates were recruited 
rather informally from the general Chicago 
area. I am sure the selection of candidates 

61 Feldman, 1982. 

62 James Watson, for the discovery of the structure of 

DNA. 


depended on both the child’s apparent intel¬ 
ligence and the radio show producers’ judg¬ 
ment about how appealing the child would 
be on the radio stage. Fortunately, more for¬ 
mal studies have been done. 

The Quiz Kid idea was a popularized ver¬ 
sion of a well-known study that had been 
(and was being) carried out by Louis Ter¬ 
man, the Stanford University professor who 
introduced the Binet tests into the United 
States. In the early 1920s Terman asked 
teachers in California schools to nominate 
exceptionally bright children for a long-term 
study. The children were then given IQ 
tests, and those who scored 140 or above (one 
in a thousand) were invited to participate. 
Eventually 1,528 students were enrolled. The 
study continued after Terman’s death in 
1956. Eventually the “termites,” as they were 
sometimes called, were followed into their 
seventies. 63 

The results were clear in one way, and 
difficult to interpret in another. Terman’s 
participants were born around 1910-20, so 
the cultural aspects of their time have to be 
kept in mind. In spite of living through a 
depression and a world war, they did excep¬ 
tionally well. A few statistics from the last 
study in which Terman himself participated, 
at which time the “termites” were in their 
fifties, shows what had happened. 64 

By the 1950s virtually all of the people 
in the study had completed their education. 
Table 10.6 provides a comparison of their 

63 Holahan & Sears, 1995. 

64 Terman & Oden, 1959. 
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Table 10.7. Family income distribution of Terman study participants in 
the 1950s, compared to “urban white families" at that time. Income 
figures are in 1950s dollars. The study participants earned much more 
than the base rate established for similar families. 


Group 

"Urban White Families” 

Terman Participants 

Income > $15,000 

1% 

30% 

$15,000 > income > $5000 

36% 

64% 

$5000 > income 

63% 

6% 


Source: Data from Terman & Oden, 1959. 


achievements compared to those born in 
roughly the same cohort. Clearly the gifted 
had much greater educational attainment 
than was typical of the time. The college 
graduation rate for the gifted, who went to 
college during the Great Depression, was 
higher than the general graduation rates at 
the start of the twenty-first century! This 
was true for both men and women. 

Over 80% of the men in the study fol¬ 
lowed professional or business careers. As 
was typical of the times, many of the women 
became homemakers. Table 10.7 contrasts 
the family incomes of the study group to 
a group that Terman referred to as “urban 
white families.” This was an appropriate 
comparison group, for the participants were 
themselves predominantly urban and white. 

The final follow-up of this group, when 
they were in their seventies, reinforced the 
picture. The “termites” had achieved high 
educational levels and had had successful 
careers. They were healthy and satisfied. 
Their marriage rates were high, compared 
to their cohorts, and their divorce rates were 
low. The incidence of severe mental ill¬ 
nesses, alcoholism, and other types of social 
dysfunction was similarly low. The study 
gave no support whatsoever to the stereo¬ 
type of the sickly, neurotic genius. Nor, I 
add, has any other study of the gifted. 

To what extent was the success of the 
termites due to their intelligence? Here the 
answer is not so clear. Terman’s work has 
been criticized on three grounds. The most 
serious is that reliance on teacher reports and 
personal contacts biased the study toward 
the selection of upper-middle-class urban 


Whites. The second is that Terman actively 
interfered in the lives of the participants, 
through interviews and mail contacts. The 
third is that the participants were not really 
geniuses. 

Terman’s selection methods were biased 
toward selecting children from high SES, 
White homes. On the average, although not 
in all cases, Terman’s participants benefited 
from strong social support, such as having 
families who could support them during 
their college years, and having the social con¬ 
tacts that facilitated success. In hindsight, it 
would have been nice if Terman had also 
followed children from a comparable SES 
group who did not have unusually high IQ 
scores. This would have greatly increased 
the cost of the study. The disparity between 
success rates in Terman’s group, compared 
to base rates in similar social groups, is so 
great that I do not think that the bias toward 
upper SES participants could have entirely 
accounted for the results. 

Terman’s selection methods were biased 
against identifying highly intelligent chil¬ 
dren in low SES families or in minority 
groups. The bias against minorities was a 
side effect of the bias toward selecting chil¬ 
dren from families with relatively high SES, 
because in California in the 1920s the dif¬ 
ference in SES between Whites and Blacks 
or Latinos was much greater than it is 
today. There may have been many talented 
individuals who should have been in Ter¬ 
man’s group but were not selected. How¬ 
ever, Terman never set out to study all, 
or even a representative group, of gifted 
schoolchildren in California. His intention 
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was to find some gifted students and follow 
them literally throughout life. He did this. 
What the distribution of gifted is in the gen¬ 
eral population is a different question. 

Would the conclusions have been any dif¬ 
ferent had there been an aggressive attempt 
to recruit bright Black and Latino children? 
We will never know. 

10 . 4 . 2 . The Gifted II: The Study of 
Mathematically Precocious Youth 
and Related Studies 

We now turn to a more modern, much 
larger study that is every bit as ambitious as 
Terman’s was, but has a more clearly defined 
recruitment procedure. 

In 1971 Julian Stanley, a professor 
at Johns Hopkins University, began the 
Study of Mathematically Precocious Youth 
(SMPY]. 65 Students in middle schools (then 
called “junior high schools”] were urged to 
take the SAT when they were twelve or 
thirteen years old. Recall that the SAT is 
designed for students in the third or fourth 
year of high school, at age sixteen to eigh¬ 
teen. Stanley initially focused on mathemat¬ 
ical precocity, but in 1973 he began to study 
verbal precocity as well. 

A two-tiered selection procedure was 
used. The SMPY researchers identified those 
twelve- to thirteen-year-old students who 
had scored in the top 3% on standardized 
tests that had already been given, as part of 
their school's normal assessment program, 
and asked them to take the SAT. Students 
who scored 500 or higher on either the 
mathematics or verbal (SAT-M or SAT-V] 
portion of the test were asked to partici¬ 
pate in the main study. This score would 
put a twelve- or thirteen-year-old student in 
the top half of seventeen-year-olds. Such a 
level of accomplishment on the SAT-M is 
an impressive accomplishment, because this 
test evaluates proficiency in mathematical 
topics that are often not covered until late 
middle or high school. The middle school 
students either had to have studied these 
topics outside of school or had to apply 

65 Stanley, 1996; Brody & Blackburn, 1996. 


general reasoning to solve the problems on 
the test. The researchers regarded the stu¬ 
dents who scored 500 or above to be in 
the top one-half of one percent (1 in 200) 
in their age group, while the students who 
scored 700 or higher were believed to be 
in the top one-hundredth of one percent 
(1 in 10,000]. The design of the study thus 
permits comparison between the bright and 
the extremely bright. 

Stanley died in 2005. The SMPY study 
has been carried on by Stanley's colleagues, 
Camilla Benbow and David Lubinski, at 
Vanderbilt University. Their intent is to 
make SMPY a fifty-year study of the careers 
of people identified as talented in their 
early adolescence. (Stanley's first partici¬ 
pants were born in the late 1950s and early 
1960s.] In addition to observing the behav¬ 
ior of talented students, Stanley and his col¬ 
leagues were interested in the nurturing of 
exceptional talent, especially in mathemat¬ 
ics. Therefore, the SMPY has included a 
teaching component, in which participants 
are enrolled in intensive summer programs. 

The data that has been gathered at this 
time (2010], which roughly carries the partic¬ 
ipants through their thirties, provides insight 
into several aspects of the development 
of the gifted. These include reactions to 
academic environments, academic achieve¬ 
ment, and accomplishments by early middle 

age. 66 

The students took to intensive academic 
instruction like ducks to water. The SMPY 
students could assimilate a year's high school 
course work in about three weeks of inten¬ 
sive study. This is consistent with other 
reports, indicating that when special instruc¬ 
tion is offered it helps everyone, but it 
helps the talented students the most. 67 I 
would add a caveat to this. The state¬ 
ment is true unless the special instruction 
is highly structured and specifically devel¬ 
oped for a low-ability group. In this case the 
high-ability group’s accomplishments may 

66 The data cited here is largely taken from Lubinski 
and Benbow's (2006) review of the status of the 
project thirty-five years after the initial enrollment 
of students. 

67 Ceci & Papierno, 2005. 
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Figure 10.11. Educational attainment of participants in the SMPY, 
by age thirty-three. Members of cohorts 1 and 2 attained scores of 
500 or better on the SAT-M by age thirteen. Members of cohort 3 
attained scores of 700 or better. For reference, in the general 
population in the US as of 2004, 27% of the population twenty-five 
or older had received bachelor’s degrees, 9% had master’s degrees, 
and 3% held a doctorate of some type, including the Ph.D., M.D., 
L.L.D., and E.D. degrees. Data from Lubinski & Benbow, 2006. 
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actually deteriorate due to loss of interest 
and motivation. 68 

The academic achievements of the group 
are staggering. Figure 10.11 shows the level of 
academic achievement obtained by SMPY 
participants who had achieved scores of 
either 500 or greater or 700 or greater on the 
SAT-M. Lubinski and Benbow have pointed 
out that it is, to say the least, interest¬ 
ing that a two-hour test taken at age thir¬ 
teen or younger can predict that the "risk,” 
if you will, of obtaining a doctorate has 
risen from three in one hundred, the U.S. 
national rate, to one in two (cohort 3 in 
Figure 10.11]. 

What about on beyond schooling? At the 
time of this writing (2010) the SMPY par¬ 
ticipants are less than fifty years old. Nev¬ 
ertheless, their accomplishments are spec¬ 
tacular. Because many of the participants 
went into science, engineering, and related 
careers, a particularly compelling compari¬ 
son contrasts the SMPY participants to grad¬ 
uate students of a similar age who had 
attended highly ranked university programs 
in science or engineering. Figure 10.12 shows 

68 Snow, 1982,1996. 


this comparison for a distinctly academic cri¬ 
terion, employment at highly ranked uni¬ 
versities, and a distinctly nonacademic cri¬ 
terion, how much money the individual 
was making. By age thirty-three approxi¬ 
mately 8% of the "over 700” men were earn¬ 
ing more than $250,000 as individuals. In 
2005 (the time of the survey] approximately 
1.5% of all US households earned more than 
$250,000. 

The SMPY was initially motivated by a 
desire to study the life histories of ado¬ 
lescents who showed exceptional promise 
in mathematics. As the study progressed it 
extended its reach to the evaluation of peo¬ 
ple with high SAT-V scores. This made it 
possible to study people whose talents were 
tilted toward either mathematical or ver¬ 
bal skills. It should be remembered, though, 
that people with very high verbal scores will 
probably have above-average mathematics 
scores, and vice versa. 

The tilt definitely affected the type of 
contribution that was made. SMPY partic¬ 
ipants whose strongest test scores were on 
the SAT-V tended to major in the social sci¬ 
ences and humanities; those with high SAT- 
M scores chose the sciences and engineer¬ 
ing fields. This difference held true both for 
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Figure 10.12. The SMPY participants compared to graduate 
students of the same age. A comparison of the accomplishments, 
by age thirty-three, of SMPY participants who had scored 700 or 
better on either the SAT-V or SAT-M, at age thirteen, to the 
accomplishments of graduate students from highly ranked U.S. 
university programs. From Lubinski and Benbow, 2006, Figure 3. 


choice of major and for later professional 
work. 69 

The mathematically precocious youth did 
not rest on their laurels. They were willing 
to work more hours than the graduate stu¬ 
dents to whom they were compared - even 
though mathematics and science graduate 
students, all in highly rated programs, are 
a notoriously hard-working group. Some¬ 
what more men than women qualified for 
the program. The disparity between men 
and women increased sharply as a func¬ 
tion of the SAT-M score. This can be seen 
from an examination of Figure 10.11. The 
male/female ratio in cohorts I and II, which 
were in the top 1/200 group, is 1.6 to one. 
The male/female ratio in cohort III, the 1 in 
10,000 group, was 8.7 to one. 

The SMPY participants tended to come 
from small families of relatively high SES. 
Their parents were much more likely to 
have college degrees, including advanced 
degrees, than would be the case for ran¬ 
domly selected students. These two relation¬ 
ships are not independent, for family size is 
negatively correlated with SES. 

69 Park, Lubinski, & Benbow, 2007. 


Only about 1% of the SMPY participants 
were African American or Latino. This is 
comparable to the figure in Terman’s study, 
although the percentage of African Amer¬ 
ican and Latino students in the schools 
had increased markedly between the SMPY 
recruitment and Terman’s recruitment, fifty 
years earlier. 

Thirty-three percent of the participants 
in the SMPY and related programs are 
Asian. 70 Nationally, Asians constitute about 
4% of the population. It has been claimed 
that Asians have a greater genetic potential 
for intelligence, and especially for mathe¬ 
matical reasoning. This claim is somewhat 
controversial. 71 However, genetic potential 
is unlikely to have been a cause for the 
overrepresentations of Asians. Eighty per¬ 
cent of the Asian participants had parents 
who had been born and educated outside of 
the United States. Were the contrast to be 
due to genetics, one would have expected 
a much higher percentage of Asians whose 

70 Brody & Blackburn, 1996. Unfortunately these 
authors did not provide figures for the various Asian 
groups. 

71 See Flynn, 1991. This topic is discussed in more detail 
in Chapter 11. 
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parents had been born in the US. Nor is 
the contrast likely to be due to some great 
deficiency in US schooling, for most of the 
non-Asian participants had been educated 
in the US school system. This strongly sug¬ 
gests that environmental factors influenc¬ 
ing the home life of recent Asian immi¬ 
grants were a major factor in the children’s 
developing interests and skills in academic 
pursuits. 

The Terman study and the SMPY are not 
the only studies of the gifted, but they are 
perhaps the largest, and their results are rep¬ 
resentative. People who do well on cognitive 
tests in their late childhood or early teens 
have quite bright prospects. High scorers 
tend to come from fairly high SES fami¬ 
lies, so some of their success may be due 
to the advantages of privilege. However, the 
accomplishments of the gifted are too sub¬ 
stantial to support a claim that advantage is 
all they have. Their own ability counts for a 
great deal. 

10 . 4 . 3 . Developing the Gifted 

A variety of acceleration programs have 
been developed to assist the gifted in reach¬ 
ing their potential. These vary from spe¬ 
cial summer courses, as in the case of the 
SMPY, to provision of accelerated tracks in 
public high schools and offering children 
as young as fourteen early admissions to 
universities. Gifted children perform very 
well in such programs. They also report 
enjoying them. There is little evidence for 
widespread social maladjustment, although 
participants in the early college entrance 
programs report preferring the company of 
their equally gifted age-mates to that of 
the considerably older college students. One 
particularly telling comment was made by 
Nancy Robinson, a professor at the Uni¬ 
versity of Washington who, together with 
her husband, Halbert, developed an early 
entrance program at that university. She 
noted that gifted students who came from 
the public schools benefitted from a pre¬ 
training program that prepared them for the 
pace of instruction at a major university. 
Why? Because they needed to develop study 


habits] They had been able to get by in their 
regular schools without exercising the disci¬ 
pline needed when they were thrown into 
the far less supportive atmosphere of uni¬ 
versity instruction. 72 

10 . 4 . 4 . ^ Comment on Criticisms 
of the Concept of the Gifted 

On occasion the results of the studies of the 
gifted have been criticized in ways that I 
think are unfair or irrelevant. For instance, 
I have heard the Terman study criticized 
for not having identified a Nobel laure¬ 
ate! There are also widespread, although as 
far as I know undocumented, stories that 
two men who did subsequently win Nobel 
Prizes in science were overlooked during 
the selection of participants. Such criticisms 
set an unrealistic standard, by asking the 
researchers to predict a one in a million 
event, while ignoring major trends in the 
data. 

Another criticism is a claim that very high 
scores on cognitive tests do not predict any¬ 
thing. This belief is extremely widespread, 
even among psychologists. 73 It is false. In 
the SMPY there is a substantial difference 
between the accomplishments of the top 1 
in 200 among test scorers and the accom¬ 
plishments of the top 1 in 10,000. Similar 
findings were noticed by Terman for people 
with IQs over 170 as children. 

10 . 4 . 5 . Th e Prospects for Individuals 
ivith Low Test Scores 

What about the other side of the coin, peo¬ 
ple in the low normal intelligence range, 
which I shall define as those whose IQs lie 
between 70 and 90? 

People in the low intelligence range are 
not automatically candidates for assisted liv¬ 
ing or other institutional programs. How¬ 
ever, more people in the low intelligence 
range are found in welfare and prison/jail 

72 See Cronbach, 1996, and Robinson, 1996, for elabo¬ 
ration on these points. 

73 See Muller et al., 2005, and Vasquez & Jones, 2006, 

for examples of such assertions. 
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populations than would be found if the 
welfare and prison/jail populations were 
selected randomly from the population. This 
does not mean that the majority of low intel¬ 
ligence individuals are headed for some form 
of institutional control. It simply means that 
they are at a greater risk of having these bad 
things happen than is the case for a randomly 
chosen member of the population. 

As in the case of the gifted, the best way 
to understand the issues facing people in the 
low intelligence range is to examine prospec¬ 
tive studies, in which individuals in the low 
intelligence range are first identified, and 
then, hopefully along with a control group of 
average and above-average individuals, are 
followed for some time. Here are three such 
studies. 

In Herrnstein and Murray’s “Bell Curve” 
study 74 of the NLSY79 panel, teenage men 
and women took the AFQT in their mid 
to late teens. 75 The Department of Defense 
uses these scores to classify people into cat¬ 
egories I-V, with categories IV and V cov¬ 
ering the range of scores below 90. About 
45% of the NLSY79 men and women in 
categories IV and V failed to obtain either 
a high school diploma or a general educa¬ 
tion degree (GED). The base rate for the 
entire NLSY79 panel was 9%. Measures of 
workplace performance were similarly low. 
Herrnstein and Murray concluded from this 
that intelligence, as measured as a teenager, 
is a major predictor of income as a young 
adult. Other analyses of the same data 76 
have agreed that intelligence is indeed a pre¬ 
dictor of income, but that ethnic status, gen¬ 
der, and location in the country also have to 
be considered. 

If we look at analyses of the intellectual 
demands of the workplace, this is not sur¬ 
prising. Look back to Figure 10.10, which 
plots wages against imputed intelligence 
requirements for over 800 jobs. Income is 
fairly flat over the wide range of occupations 
that have low imputed intelligence require¬ 
ments. The figure also shows that there is 

74 Herrnstein & Murray, 1994. 

75 See Chapter 2 for a discussion of the ASVAB and 

the AFQT. 

76 See, e.g. the analyses in Devlin et al., 1998. 


a considerable spread between the mean 
and ninetieth percentile of earnings within 
occupations. To the extent that intelligence 
determines within-occupation earnings, as 
we have seen that it does (section 10.3), one 
would expect people with low scores to earn 
less. And they do. 

The picture is somewhat bleaker with 
respect to welfare. In this case we have 
to make a distinction by ethnicity, for wel¬ 
fare rates vary markedly with ethnic status. 
Herrnstein and Murray found that the rate at 
which White women in the low intelligence 
range had received Aid for Dependent Chil¬ 
dren support was better than four times the 
rate in the NLSY79 sample as a whole. 

A more detailed picture of what hap¬ 
pens to the low intelligence group in the 
workplace can be obtained by examining a 
Department of Defense study of how low 
intelligence males fared in military service, 
in a setting where working conditions are 
more precisely defined and recorded than 
they are in the civilian workforce. The study, 
Project 100,000, conducted in the late 1960s 
at the behest of Secretary of Defense Robert 
McNamara, was motivated by an important 
policy issue. 

The US military normally does not recruit 
Category V individuals, and is limited, by 
law, to recruiting a fixed percentage of Cat¬ 
egory IV soldiers. The argument for the pol¬ 
icy is that it does not make sense to go to 
the added expense of training and supervis¬ 
ing personnel who perform at a low level. 
On the other hand, the military forces have 
to have a certain number of recruits each 
year. If the category IV designation is a false 
indicator of military performance, exclud¬ 
ing these men and women would amount to 
ignoring a potential recruiting population. 

In Project 100,000 approximately 100,000 
Category IV men (IQ range roughly 80-90) 
were enlisted outside of the normal chan¬ 
nels. Their military careers were compared 
to those of a control group of enlisted ser¬ 
vicemen who matched them in age and edu¬ 
cational status prior to entry, and who had 
met normal recruitment standards. The con¬ 
trol group underrepresented the higher lev¬ 
els of AFQT scores (I and II) compared to 
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Table 10.8. Attrition rates during basic training (percentages) for 
Project 100,000 participants and for the service as a whole, broken 
down by military branch. 


Branch of Sendee 

Project 100,000 Enlistees 

Overall Service Rate 

Army 

37 

2.0 

Marines 

11.1 

4-4 

Navy 

8.6 

2.8 

Air Force 

9.2 

3 -o 


Note: Data from Sticht et al., 1987. Figures are for the 1969-72 period, during 
the Vietnam War. 


the percentages found in the general popula¬ 
tion, for there were no officers in the study. 77 
In civilian terms, Project 100,000 exam¬ 
ined the workplace performance of peo¬ 
ple involved in blue-collar and lower-level 
white-collar occupations, excluding man¬ 
agerial positions above the foreman level, 
and excluding professional occupations. 

In all military services the first thing that 
enlistees do is to go through recruit training 
or, as it is known in the Navy and Marines, 
boot camp. The ostensible goal of recruit 
training is to inform the enlistees about ser¬ 
vice customs and to provide them with a 
taste of the life they can expect in the future. 
This taste (and the service life that follows) 
varies greatly depending upon the service; 
the Army and Marines envisage a different 
life for recruits than do the Navy and the 
Air Force, and the Navy and Air Force dif¬ 
fer from each other. Boot camp has a sec¬ 
ond, less announced purpose. It serves as a 
screening device to see which enlistees have 
the adaptability to change from civilian to a 
more disciplined military life. 

Table 10.8 shows the attrition rate from 
boot camp during the period of the Project 
100,000 study, roughly 1969-71. Attrition 
rates were higher for Project 100,000 than 
for the control group in every service. Attri¬ 
tion rates also varied considerably across 
services. This may be because of both dif¬ 
ferential recruitment by the services and dif¬ 
ferences in basic training itself. Basic military 
training, strictly construed, requires roughly 
a fourth-grade reading ability. Because the 

77 Sticht et al., 1987. 


Navy and the Air Force have more tech¬ 
nical billets than the other services, higher 
reading and mathematics requirements may 
have been in force in those services. These 
requirements would be especially hard on 
the Project 100,000 servicemen, compared 
to members of the control group. By con¬ 
trast, the Army and the Marines place more 
stress on determining a recruit’s ability to 
follow instructions in physically demanding 
situations. 

Looking at the other side of the coin, 
the vast majority of the Project 100,000 ser¬ 
vicemen did complete basic training. From 
the viewpoint of the services, a rigid insis¬ 
tence on conventional standards would have 
excluded over 90,000 trainable enlistees. 
That would have been a serious loss. 

Modern military services are a microcosm 
of society. They contain some positions, 
such as infantryman, that are clearly unique 
to warfare. They also contain many posi¬ 
tions that either have exact civilian counter¬ 
parts, such as clerk or electronic technician, 
or have close analogs in civilian life, espe¬ 
cially outside of combat. We have already 
seen, from the analysis of civilian occupa¬ 
tions in section 10.3, that occupations vary 
a good deal in their demands for intelli¬ 
gence. Where did the Project 100,000 ser¬ 
vicemen wind up during their first term of 
service? 

Figure 10.13 shows that Project 100,000 
servicemen were much more likely to be 
assigned to nontechnical, nonspecialized 
jobs than were normally enlisted servicemen 
matched for pre-service education and eth¬ 
nic status. However, the evidence that this is 
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Figure 10.13. Assignment of Project 100,000 and a matched control 
group to different classes of military occupations during their first 
term of enlistment. Based on data reported by Sticht et al. ; 1987. 


due to their demonstrated abilities (or lack 
of them) after testing is not as strong as it 
could be. Assignments to technical billets 
are made based partly upon observations 
and interviews during recruit training, and 
partly upon a serviceman's ASVAB scores, 
obtained prior to entry. This means that the 
low scores of the Project 100,000 servicemen 
may have influenced their assignment. 

From the viewpoint of a scientific study, 
this was a flaw in the design. However, one 
can hardly fault the services for using the 
scores to make assignments in ways believed 
to minimize training costs. 

As in the case of the data on attrition, 
one can see two different things in the data 
on occupations. On the one hand, it shows 
(not surprisingly) that low intelligence men 
tended not to be assigned to technical and 
administrative occupations. On the other, 
they filled roles that are vital to the military 
mission. 

What happened after the servicemen 
completed recruit training? Figure 10.14 
compares the career progress of Project 
100,000 and control servicemen, in terms of 
rank attained through their first enlistment 
and, for those that stayed in the service, their 
status in 1983, which would have been from 
twelve to fourteen years after initial enlist¬ 
ment. The picture is similar to the earlier 
comparisons. The Project 100,000 service¬ 
men were generally not “top of the line," 
rapidly promoted military personnel. On the 


other hand, they did show progress in their 
careers. 

Project 100,000 was replicated. . . acci¬ 
dentally. During the 1980s a technical mis¬ 
take was made in determining the norms 
for a revised version of the ASVAB. Before 
the mistake was discovered several thousand 
normally unqualified Category IV soldiers 
were enlisted into the Army. The careers of 
these soldiers have been followed, and by 
and large the results are the same as those 
obtained in the better-controlled Project 
100,000 study. 78 There would be little point 
in repeating the statistics. However, I will 
recount a hopefully informative anecdote to 
enliven the statistics. 

While serving as a consultant to a research 
project studying the accidentally recruited 
Category IV soldiers, I interviewed a lieu¬ 
tenant colonel who had commanded an 
armored battalion containing a high per¬ 
centage of Category IV soldiers. The colonel 
believed that category IV soldiers performed 
well when doing clearly defined tasks, even 
if they were quite detailed. He offered as an 
example the task of replacing the power unit 
on a tank. This task is done in over a dozen 
well-defined steps, always done in the same 
sequence. After training, Category IV sol¬ 
diers could replace the power unit. 

78 Sticht et al., 1987, discusses the renorming prob¬ 
lem and the resulting performance of the soldiers 

involved in some detail. 






















WHAT USE IS INTELLIGENCE? 


353 



E1-E2 E3 E4 E5 E6 E7 E8 


■ Project 100,000 1st 
Enlistment 

1 Control 1st Enlistment 
* Project 100,000, 1983 
□ Control 1983 


Figure 10.14. The advancement in rank of Project 100,000 
personnel. The percentage of servicemen in the Project 100,000 and 
control groups who had reached different enlisted pay grades either 
during their first enlistment or after approximately thirteen years of 
service. Pay grades Ei and E2 were the grades typically assigned 
immediately upon completion of recruit service (e.g., private or 
private first class in the Army and Marines). Pay grade E4 (sergeant 
in the Army or Marines) was considered satisfactory service during 
the first enlistment. The higher pay grades represent staff 
noncommissioned officers (NCOs) or, in the Navy, Petty Officers. 
Grade E8, Master Sergeant or Chief Petty Officer, was a position 
with considerable prestige and responsibility. 


More generally, the colonel believed that 
Category IV soldiers could be trained to do 
tasks where the instructions were “do this, 
then do that, then the other thing 

The colonel thought that Category IV sol¬ 
diers had trouble with tasks that are defined 
by the end to be accomplished, rather than 
by the steps to be taken. He offered as an 
example the task of recalibrating a gun sight 
that has been knocked out of alignment with 
the barrel of a tank’s cannon. The instruc¬ 
tions for this task began, “Find a way to 
fix the sight rigidly on the tank’s hull.” This 
instruction defines the goal to be accom¬ 
plished and leaves it up to the soldier to find 
a way of accomplishing the goal. 

If we place this in a more psychologi¬ 
cal framework - for the colonel certainly 
did not use such words - the Category IV 
soldier could learn what to do in a well- 
defined situation but had trouble anticipat¬ 
ing what would happen if he took a particu¬ 
lar sort of action in a more poorly defined 
situation. This sort of deficit appears to 
be characteristic of low intelligence people 


in schools, the military, and the civilian 
workplace. Depending upon whether you 
approach intelligence from the viewpoint 
of the psychometrician, the information¬ 
processing psychologist, or the cognitive 
neuroscientist, you can say that people 
with low intelligence have trouble with 
tasks that have a high demand for g, tasks 
that involve the working memory/executive 
function class of behaviors, or tasks that 
place demands on the forebrain-cingulate 
cortex circuit. All three statements amount 
to the same thing. 

10.5. A Concluding Comment on 
Intelligence and the Workplace 

Someone who says that intelligence, as 
assessed by standard cognitive tests, is irrel¬ 
evant to performance in either academia 
or the workplace is simply wrong. So is 
someone who says that intelligence does not 
amount to very much, compared to a vari¬ 
ety of personality characteristics. The data 
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both within and across occupations show 
that measures of intelligence are among the 
best, and most often the best, predictors of 
academic and occupational success. On the 
other hand, no one has claimed that cogni¬ 
tive tests are perfect predictors. Predictive 
validity correlations are in the .4-6 range, 
which is much better prediction than can be 
achieved with any personality measure, but 
is still far from perfect. 

The same message can be extracted from 
studies of extremes. On the whole, people 
who have high intelligence test scores do 
quite well. On the average the gifted get bet¬ 
ter jobs and make more money. The Terman 
study, and all studies of the gifted afterward, 
gave the lie to the stereotype of a gifted per¬ 
son as a neurotic introvert with health prob¬ 
lems. The contrast between the SMPY 1 in 
200 and 1 in 10,000 groups shows that the 
tests have predictive power at very high lev¬ 
els, directly contradicting statements to the 
contrary by psychologists who do not, them¬ 
selves, study individual differences. 

Here are some documented statistics, 
but on a very small sample, so the report 
falls somewhere between a scientific fact 
and an anecdote. In the 1960s the National 
Aeronautic and Space Agency conducted 
an intensive screening of volunteers to 
become the first American astronauts, the 
MERCURY and GEMENI programs. The 
selected candidates had WAIS IQs averaging 
135. Unselected candidates had IQs averag¬ 
ing 131. A control group of comparably aged 
aviators averaged 118. Being intelligent is part 
of having the right stuff. 79 

At the opposite end of the distribution, 
people in the low intelligence range gener¬ 
ally have difficulty with school, especially 
if they are placed on an academic track, 
and usually take occupations that do not 
make high cognitive demands. They earn 
less than people in the normal-high intel¬ 
ligence ranges and are more likely to require 
some form of welfare assistance. 

It cannot be stressed too strongly, though, 
that these are trends. Every study of the 
extremes comes up with exceptions. There 

79 Santy, 1994, p. 276. 


are people who do quite well although they 
had modest test scores, and there are stun¬ 
ning examples of people with high scores 
who never live up to their promise. 

Given these facts, why is there a 
widespread belief that intelligence does not 
count for very much in life? I think, but can¬ 
not prove, that several factors are involved. 

One factor is a failure of unrealistic 
expectation. We may expect the gifted per¬ 
son to be a casual genius who can solve diffi¬ 
cult problems without much care or effort. 
That is not the case. Two of the personal¬ 
ity characteristics of the gifted are that they 
generally enjoy their work, and that they 
work very hard at it. The brilliant genius 
who, without training, knows at a glance 
the answer to difficult problems in math¬ 
ematics, physics, or what have you is a very 
rare bird. 80 

People without statistical training have a 
hard time grasping the concept of something 
that increases the probability of an event 
but does not establish its certainty. Thus if 
we can think of examples of people with 
high test scores doing stupid things, or peo¬ 
ple with low test scores doing good things, 
that is taken as proof that the predictors do 
not work. In the newspaper business “Man 
bites dog” is news, “Dog bites man” is not. 
The unusual is publicized and sticks in our 
minds. The prosaic does not. 

We may have an unrealistic idea about 
the extent to which personal characteristics 
determine success. Large-scale social and 
economic forces, and idiosyncratic imper¬ 
sonal events, can play a great part. The 
various reports of the Terman group stress 
how much these highly intelligent people 
were influenced by having come of age dur¬ 
ing the Great Depression and then, espe¬ 
cially for the men, having had to deal with 
World War II. The SMPY group has grown 
up in times of relative peace and economic 
expansion, at least until 2010. Such things 

80 It is possible that a very few such individuals exist. 
The best-documented case study is of the Indian 
mathematician Ramanujam, who made major con¬ 
tributions to mathematics even though he was self- 
taught. He also spent a great deal of time working 
on mathematical problems. 
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influence one’s success, quite outside of per¬ 
sonal traits. While it is true that we partly 
make our own environments, we are partly 
stuck with them. 

People are heavily influenced by their 
personal experiences. Charles Murray has 
pointed out that we live in a society that is 
sharply stratified by intelligence. 81 College- 
educated people, by and large, deal with 
other college-educated people, and people 
with high school educations deal with each 
other. Within the restricted range of intelli¬ 
gence that people can observe directly, other 
variables may account for more variation 
in performance than intelligence does. It is 
only when we step back and look at the big 
picture that the importance of intelligence 
becomes clear. 

My final speculation is that some peo¬ 
ple, for understandable reasons, do not want 
to find that intelligence has an effect upon 
success. 

Many people in post-industrial society, 
and especially in post-industrial American 
society, hold a belief and have two attitudes 
that, combined, provide a motivation for 
rejecting intelligence as a (partial) explana¬ 
tion of workplace success. The belief is that 
intelligence is something that is more or less 
fixed for life. As was discussed in Chapter 9, 
and will be discussed further in the section 
on aging in Chapter 11, this is a false belief - 
especially about intelligence in the concep¬ 
tual sense of the problems that people can 
solve, rather than in the narrower sense of 
a test score. But that is not what people 
think. When one combines a belief in the 
permanence of intelligence with an attitude 
of distrust of elites, it becomes almost nec¬ 
essary to argue that intelligence is of little 
relevance in life, for to do otherwise can be 

81 Murray, 2008. 


seen as an affirmation of the appropriateness 
of rewarding an elite class of thinkers. 

The second attitude is quite different. It 
has to do with a sincere desire for equal 
opportunity for all. 

There are marked differences in the typ¬ 
ical cognitive test scores obtained by mem¬ 
bers of different racial and ethnic groups. 
There are much smaller, rather complex dif¬ 
ferences in test scores between men and 
women. Up to the middle of the twentieth 
century these differences were used to jus¬ 
tify varying degrees of segregation of minor¬ 
ity groups, in areas including admission to 
universities and the granting of specialized 
degrees, and to place less strict, but still 
important, restrictions on women’s oppor¬ 
tunities. Since that time overt discrimina¬ 
tion has virtually stopped, but there are peo¬ 
ple who have used group differences in test 
scores as evidence for the proposition that 
group differences in educational and profes¬ 
sional achievement are largely due to group 
differences in intelligence. 82 To the extent 
that this conclusion is correct, provision of 
equal opportunity for all groups in society 
will not produce an equal distribution of 
social and economic rewards across groups. 

Many people who are deeply commit¬ 
ted to social equality find such a conclusion 
offensive. It is difficult for them to argue 
about the fact of differential distribution of 
test scores. Therefore, they deny the rel¬ 
evance of scores to social outcomes. This 
denial cannot be maintained. 

The discussion of racial and ethnic differ¬ 
ences in intelligence, and male:female differ¬ 
ences in intelligence, raises extremely com¬ 
plex issues. We take them up in the next 
chapter. 

82 See, for instance, Murray, 2005; Rushton & Jensen, 

2005; and Lynn and Vanhanen, 2002, 2006. 


CHAPTER 1 1 


The Demography of Intelligence 


By nature men are nearly alike. By practice 
they get to be far apart. 

- Confucius (attribution in 
Bartlett's Familiar Quotations, 
15th ed., 1980) 

People are not all the same. There are young 
people, old people, and people in between. 
Nevertheless, we do make distinctions, rang¬ 
ing from binary decisions about the right 
to vote to the minimum age at which one 
qualifies for a pension. Many of these dis¬ 
tinctions are based on the underlying pre¬ 
sumption that intelligence grows, and then 
declines. There are mandatory retirement 
age requirements for air traffic controllers 
and commercial aviators, largely because of 
our beliefs about changes in their cogni¬ 
tive capabilities. In the United States federal 
judges are appointed for life. Many of the 
individual states have mandatory age limits 
for judges. 

There are men and women, a sharper bio¬ 
logical distinction than young and old. His¬ 
torically, many societies have assumed that 
men and women have different cognitive 


capacities. In contemporary post-industrial 
societies there is a presumption of equal¬ 
ity. But does “equality” mean “identity?” 
Should we regard a low incidence of women 
in corporate law practice with suspicion? 
Should the acceptable percentage of women 
in corporate law practice be the same as the 
acceptable percentage of women working as 
helicopter pilots? 

People are identified with, and self- 
identify with, various racial and ethnic 
groups. Do these groups differ in intelli¬ 
gence? To some people the answer is obvi¬ 
ous - they do. Other people believe that 
even the suggestion of a difference is evi¬ 
dence that the speaker is prejudiced. World¬ 
wide, we are organized into nations. Do the 
people of the world all have the same intel¬ 
ligence? Herodotus, the father of history, 
thought that easy climates produce people 
with soft minds and little resolve. Richard 
Lynn, a modern professor at the Univer¬ 
sity of Ulster, has given new life to this 
argument. 1 


1 Lynn, 2006. 



THE DEMOGRAPHY OF INTELLIGENCE 


357 


And identifying group differences is sel¬ 
dom enough. We want to know why differ¬ 
ences occur. 

This chapter describes the scientific evi¬ 
dence regarding the relation between intel¬ 
ligence and demographic variables: age, sex, 
race, and ethnicity. I shall try to present 
an objective discussion of these contentious 
issues. 

11.1. The Issues Involved 

I will not present any simple one-liner 
answers to any issue involving group differ¬ 
ences. In my opinion none exists. A person 
who makes a sweeping assertion about the 
existence or cause of age, sex, racial/ethnic, 
or national differences should be regarded 
with extreme suspicion. Throughout I have 
tried to separate discussions of scientific fact 
from hypotheses and from discussions of 
policy. Maintaining the distinction between 
scientific findings and policy recommenda¬ 
tions is very important. Psychologists can, 
for instance, inform lawmakers about the 
facts that reaction times increase and the 
ability to control attention drops as people 
age. Whether or not there should be upper 
age limits on automobile driving is a policy 
decision. 

Because the study of group differences 
can be so contentious, it is a good idea to set 
the ground rules by discussing some prob¬ 
lems of analysis and interpretation that keep 
coming up in this area of investigation. The 
rules fall into two categories: general prin¬ 
ciples and issues of interpretation. Panel 11.1 
sets forth some general principles. The rest 
of this section focuses on issues of interpre¬ 
tation. 

11.1.1. Distinguishing between Cognitive 
and Noncognitive Effects 

Intelligence is a “can do” concept. The dis¬ 
tinction between “can do” and “will do” is 
relevant to the study of group differences, 
because when group differences in perfor¬ 
mance are observed it may not be clear 


whether they are due to differences in intel¬ 
ligence or motivation. 

The case of Asian-Americans in the 
United States provides a good example. As 
a group, Asian-Americans are represented 
in higher education far out of proportion 
to their frequency in the population. In 
autumn of 2008 Asians made up 35% of the 
student body at the University of Califor¬ 
nia, Berkeley. Asians constitute just over 4% 
of the US population, and 12% of the pop¬ 
ulation of California. 2 Some have attributed 
the overrepresentation of Asians in selective 
universities to Asian intelligence, for Asians 
as a group score slightly above the popu¬ 
lation mean on some g-loaded intelligence 
tests. This point is discussed in more detail in 
section 11.3. But is this because of a biological 
capacity? One also has to consider cultural 
traditions and expectations about the advan¬ 
tages of education that may provide Asian- 
Americans with more-than-average motiva¬ 
tion to succeed in cognitive endeavors. 5 

A good deal has been made of the pos¬ 
sibility that some people consistently score 
below their capabilities on cognitive tests 
and other measurements of achievement 
because they just do not try. The phe¬ 
nomenon is by no means limited to cog¬ 
nition; if people have a mindset to expect 
low achievement, they will not work hard to 
become high achievers in almost any field. 4 
In the case of group differences, it has been 
claimed that a particular type of negative 
mindset, stereotype threat, may be one of 
the reasons that certain groups, including 
African Americans and women, tend not to 
do well on tests of mathematical reasoning. 
They believe that they are not the sort of 
people who do mathematics well, so they 
are willing to disengage from the hard work 
of problem solving required to obtain high 
scores on mathematics tests. 5 

2 Information provided by the University of Cali¬ 
fornia, Berkeley, at the website 0sr2.berkeley.edu/ 
twiki/bin/view/Main/Fall20o8EthnicDistribution, 
and by the 2000 US Census. 

3 Flynn, 1991; Sue & Okazaki, 1990. 

4 Dweck, 2006. 

5 Steele & Aronson, 1995, 1998. 
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Panel 11.1. Some Principles for 
Guiding Research on Group 
Differences in Intelligence 

The following rules are abridgments of 
suggestions about conducting research on 
group differences that my colleague Prof. 
Jerry Carlson, of the University of Cali¬ 
fornia, Riverside, and 1 made in an article 
in the journal Perspectives on Psychological 
Science .* Our ideas were attacked on the 
somewhat contradictory grounds that our 
guidelines would unduly restrict inquiry^ 
and that they would encourage delving 
into questions that are not scientific and 
that we did not understand/ We made 
a rejoinder.** 1 encourage readers inter¬ 
ested in this issue to look at the origi¬ 
nal articles. Here I simply paraphrase the 
guidelines. 

1. The measures must have construct 
validity, that is, it should be clear why 
they measure what they purport to 
measure. 

2. Measurements must be valid in all 
groups involved. See the main text 
for a discussion of what this means. 

3. The fact that a score can be changed 
by training is not evidence against an 
inherent group difference unless the 
altered score is as valid a measure as 
the original score. 

4. Generalization to populations 
depends crucially upon the relation 
between the sample and the popu¬ 
lation. "Any old convenient sample” 
will not do. A generalization from 
observations of a group difference in 
college students, for instance, to a 
conclusion about group differences 
for people of all ages is not automat¬ 
ically valid. 


3. Literature summations must be done 
carefully, with special attention to 
research results that do not con¬ 
form to the reviewer’s conclusions. 
Complete objectivity is impossible, 
but discussants in a scientific debate 
should strive for this ideal. 

6. Alternative hypotheses and models 
should be considered. 

7. The alternatives should duly repre¬ 
sent the original authors’ ideas. Straw 
models should not be set up in order 
to be knocked down. This has been 
a particular problem in the study of 
group differences. For instance, peo¬ 
ple sometimes attack the position 
that group differences are either enti¬ 
rely genetic or entirely environmen¬ 
tal, whereas the real question is how 
much the environment or genetic 
inheritance influences the difference. 

8. Heritability coefficients are measures 
of the relative size of genetic and 
environmental influence, and can 
vary across populations. 

9. When a policy recommendation is 
made, one’s policy model, includ¬ 
ing attitudes about desirable conse¬ 
quences, should be stated. 

10. Be willing to say "We don’t know.” 
In many cases we do not know what 
causes a group difference. There are 
some cases, especially involving the 
evolution of cognition, where it is 
unlikely that we shall ever know. 
Be willing to acknowledge such sit¬ 
uations. Acknowledging ambiguity is 
not a sign of weakness, it is a sign of 
intelligence! 

* Hunt & Carlson, 20073,0. 

* Gottfredson, 2007a. 

Sternberg Sc. Grigorenko, 2007. 

3 Hunt Sc Carlson, 2007b. 


There is no doubt that stereotype threat 
exists. The effect has been demonstrated 
in laboratory studies involving racial/ethnic 
groups, men and women, and aging par¬ 


ticipants. There is considerable controversy 
over whether the phenomenon occurs when 
students take a high stakes test, such as 
a college entrance examination, because 
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when the stakes are high external motivating 
factors may override any internal willingness 
to give up. The evidence for and against 
stereotype threat effects will be discussed 
later. Here I bring up the problem as an 
example of how behavior will depend upon 
both "can do” and “will do.” Before we can 
use a difference between two groups in test 
scores, or any other cognitive achievement, 
as evidence of a difference in underlying 
intelligence, we must be sure that motiva¬ 
tion was equivalent in each group. 

11 . 1 . 2 . Recruitment Effects 

It is probably true that more studies have 
been carried out on college students than 
on any other population. For instance, a 
great deal of our information about the dif¬ 
ferences in cognitive abilities of men and 
women is based on observed differences 
between male and female college students. 
Such studies are subject to recruitment effects 
that limit their generality. Let us look at such 
situations abstractly. 

In the typical study of group differ¬ 
ences an investigator identifies some acces¬ 
sible population (e.g., college students) that 
contains members of the groups of inter¬ 
est. Obviously, there are male and female 
college students. Participants are selected 
from the groups of interest within the 
accessible population. Group differences are 
then observed. Assuming a random selec¬ 
tion from the accessible population, con¬ 
ventional statistical methods can be used to 
justify making a general statement about the 
accessible population. But what about con¬ 
clusions about group differences in the pop¬ 
ulation at large? The validity of generaliza¬ 
tion depends upon how group members are 
recruited into the accessible population. 

Here is a concrete example. Two Cana¬ 
dian psychologists, Douglas Jackson and J. 
Phillipe Rushton, found that in the popu¬ 
lation of people who had taken the SAT, 
men had higher scores than women. 6 Does 
this mean that men in their late teens (the 
age at which the SAT is usually taken) are 

6 Jackson & Rushton, 2006. 


generally more intelligent, as assessed by the 
SAT, than women of the same age? The con¬ 
clusion would be valid if young men and 
women, in the population at large, were 
equally likely to take the SAT. But they 
are not. Since 1980 the majority of college 
students have been women. It may be that 
the most intelligent 60% of women attempt 
to go to college, compared to the most 
intelligent 40% of men. If this is so, the 
recruitment process is different for men and 
women. 

In such a case comparison of the SAT 
scores would give a false picture of the popu¬ 
lation difference. If there are no male-female 
differences in intelligence in the general 
population, the top 60% of the female popu¬ 
lation would be expected to score lower, on 
the average, than the top 40% of the male 
population. Of course, this is an oversim¬ 
plification. A more sophisticated model of 
male-female differences in the recruitment 
process for taking the SAT has been verified. 
Applying it, my colleague Tara Madhyastha 
and I found that Jackson and Rushton's 
results could not be used to argue that there 
are differences in intelligence between men 
and women in general, although they could 
be used to argue that there are differences in 
intelligence within the college population. 7 

Recruitment effects can be very large. 
They affect all types of group differences 
that we will consider: age differences, male- 
female differences, racial/ethnic differences, 
and international comparisons. 

11 . 1 . 3 . Establishing Causation 

Students in the social sciences learn to recite, 
almost as a mantra, “Correlation does not 
mean causation.” The rooster may crow 
before dawn, but we do not conclude that 
the rooster’s crowing makes the sun rise. 
In laboratory situations we avoid confus¬ 
ing correlation with causation by conducting 
controlled experiments, where one possi¬ 
ble causal factor is systematically changed as 
other factors are held constant, or by assign¬ 
ing participants randomly to experimental 

7 Hunt and Madhyastha, 2008. 
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and control groups. This does not work for 
the study of group differences. We cannot 
assign people randomly to be young or old, 
male or female, Black or White, American 
or British. They come as they are. And when 
they do, they typically come with many dif¬ 
ferences besides their group membership. 
This makes understanding some differences 
extremely difficult. 

There are collinearities between group 
membership and a host of variables. Con¬ 
sider the relation between intelligence and 
national wealth. (We will go into this in 
considerable detail in section 11.5.) There is 
a fairly high correlation between national 
wealth, measured by gross domestic prod¬ 
uct per person (GDP/c], and national aver¬ 
age scores on intelligence tests. There is 
also a high correlation between mean intel¬ 
ligence test scores and indices of physical 
health, intelligence, scholastic achievement, 
and GDP/c. What are the causal relations? 
One could argue that intelligence causes 
wealth, or that school achievement causes 
wealth, or that wealth makes possible good 
schools, which in turn cause intelligence to 
rise. 

Unfortunately, there have been studies in 
which investigators simply ignore the prob¬ 
lem. They seize upon a single causal variable, 
establish that it has a correlation with, say, 
intelligence, and spin a plausible tale about 
why this is so. Historically, this is the sort 
of reasoning that led people to believe that 
malaria (Latin, mal aria = bad air] was due 
to dank, hot air because there was a corre¬ 
lation between humid conditions and out¬ 
breaks of the disease. We now know that 
the disease is borne by mosquitoes. Empha¬ 
sizing isolated correlations might have been 
justified in the days before the technologies 
for multivariate statistical analyses had been 
developed. They are not justifiable today. 

Collinearity issues are partly addressed in 
quasi-experimental designs, in which we find 
two groups that appear to be equal on all the 
variables that we believe are relevant, except 
one, which may or may not be an exper¬ 
imental manipulation. This is the sort of 
design used when teaching methods are con¬ 
trasted in comparable school districts, with 


one method being used in one district and 
another method in another. Similar cases 
appear in the literature on intelligence. For 
instance, one of the pieces of evidence used 
to argue that schooling influences general 
reasoning is that in the early twentieth cen¬ 
tury children's IQ test scores were higher in 
isolated rural communities that had schools, 
compared to the scores of children who did 
not have access to schools.^ The argument 
is valid to the extent that the different com¬ 
munities really were equated on variables, 
such as nutrition and health, that could also 
influence test scores. 

Collinearity issues can be addressed by 
means of a statistical technique known as 
causal modeling, which is used to deter¬ 
mine which of several explanations best fits 
the data. For instance, the previous chap¬ 
ter reported an investigation of the relation 
between socioeconomic status (SES), intel¬ 
ligence (as indexed by the SAT] and aca¬ 
demic achievement. 9 Three causal models 
were considered: one in which intelligence 
and SES had independent influences on aca¬ 
demic achievement, one in which intelli¬ 
gence had an influence only as a proxy for 
SES, and one in which SES had no direct 
influence on academic achievement, beyond 
any influence that it had on intelligence. The 
third model was shown to be the most accu¬ 
rate. Note that the question has shifted from 
finding a statistically significant relationship 
to the more sophisticated issue of selecting 
the best model. 

Causal modeling is a powerful technique 
but is not perfect. When all is said and 
done, causal models are based on corre¬ 
lations, and correlation is still not causa¬ 
tion. Causal modeling does not establish 
what the truth is. The technique gains its 
power by rejecting initially plausible mod¬ 
els for better ones, not by rejecting the null 
hypothesis, which almost no one believes in 
anyway. 

Numerous studies have been done using 
what at first appears to be a classic experi¬ 
mental design. Members of different groups 

8 Ceci & Williams, 1997. 

9 Sackett et al., 2009. 
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are recruited; within each group participants 
are randomly assigned to experimental and 
control conditions, and the result of the 
experiment-control contrast is compared 
across groups. Stereotype threat has been 
investigated in this way. People in an 
affected group (women, African Americans, 
etc.) are either reminded of their group iden¬ 
tity or not reminded of it, and then given 
a test on which members of the group are 
said to do poorly. The same logic could 
be used to compare test scores obtained by 
people who had been given either a newly 
invented “IQ pill" or a placebo. In an exten¬ 
sion of this design, the same experimental- 
control contrast is applied within different 
groups, for instance, in an investigation con¬ 
trasting two teaching methods in groups of 
Black and White students. 

Studies of this sort are excellent ways of 
demonstrating that a particular manipula¬ 
tion can (or cannot) have an effect on test 
scores. They do not show whether the effect 
actually occurs, or how large it is compared 
to other effects, in the world outside the 
laboratory. 

No method for investigating group differ¬ 
ences in intelligence is perfect. They all have 
something to contribute. The strengths and 
weaknesses of the various designs have to 
be kept in mind as we consider the many 
results that have been obtained in the study 
of group differences in intelligence. 

11.1.4. Statistical and Measurement Issues 

In evaluating studies of group differences 
two statistical issues are of importance. They 
deal with measurements of the size of a 
group difference and the appropriateness of 
using tests as a way of assessing intelligence 
in different groups of people. 

ESTABLISHING THE SIZE OF GROUP 
DIFFERENCES 

In virtually all cases we shall consider, esti¬ 
mates of intelligence in one group will not be 
uniformly higher or lower than differences 
in another group. In fact, this is true of most 
human traits. What we need is some mea¬ 
sure of the extent to which two populations 


overlap. The difference between means in 
deviation units, d, does this. 

Consider two arbitrary groups, with 
mean scores on some test M 1 and M 2 , and 
with within-group standard deviations S 1 and 
S 2 . The deviation score, d, is defined as 10 


The d measure provides a picture of the 
extent to which two groups overlap. The 
larger the value of d, the less the overlap. 
This is illustrated in Figure 11.1, which shows 
the overlap between three groups that are 
ordered with respect to their means, and 
have identical standard deviations within 
each group. 

The d measure is a measure of the size 
of an effect, just as the correlation coeffi¬ 
cient, r, is a measure of the size of an effect. 
Before interpreting a finding that d is not 
large enough to be statistically reliable at 
the .05 or .01 level, it has to be shown that 
the study had adequate statistical power to 
detect the smallest difference of interest. All 
the concerns about power that applied to r, 
and were introduced in the previous chap¬ 
ter, also apply to d. 

THE APPROPRIATENESS OF USING THE 
SAME TEST IN DIFFERENT GROUPS 
If we want to measure group differences on a 
physical variable, such as height or weight, 
the principle is simple. We use a physical 
device to make a physical measurement. If 
we want to compare, say, the heights and 
weights of Asians to the heights and weights 
of Africans, there is no concern that tape 
measures and scales would behave differ¬ 
ently across groups. The problem is subtler 
when it comes to cognitive testing, because 
different types of people may systematically 
respond to a test in different ways, thus 
tapping different underlying psychological 
variables. 

10 The equation shown applies to a contrast between 
two groups of equal size. The case of unequal group 
sizes requires a slight complication in equation 11.1, 
but poses no new conceptual issue. 
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Figure 11.1. An illustration of the overlap between distributions. All 
distributions have an arbitrary standard deviation of 1. Distribution 
A has a mean at o on the abscissa, distribution B has a mean at —1, 
and distribution C has a mean at +1. The d (deviation) value for 
A - B is 1, for A - C is —1, and for B - C is —2. The amount of 
overlap decreases as the d value increases. 


To illustrate this, imagine that we decide 
to develop a direction-taking test to measure 
people’s ability to orient in space. The test 
might contain items like this: 

Go to the flagpole to the north of your 
present position. 

Then go east to the church. 

Turn north at the church and proceed to 
the school. 

We then see how long it takes people to 
get to the school, and how many deviations 
they make from the prescribed route. 

Remember that we are not interested in 
direction-taking per se. We are interested in 
performance on the direction-taking test as 
a measure of a more general ability to orient 
in space. 

If we use a contrast between the speeds 
with which men and women arrive at the 
school, we have implicitly assumed that 
men and women will approach the test in 
the same way. But there is evidence to the 
contrary. Men often approach problems in 
direction taking by constructing a mental 
map of the territory and the route through 
it. Women tend to memorize the directions 
and use them to follow a path. 11 The test 

11 Hunt, 2002. 


evaluates a different underlying trait in men 
and in women, so comparing scores would 
be valid if we were interested in how well 
people did in direction finding, but not valid 
if we were interested in making an infer¬ 
ence about men’s and women’s standing 
on an underlying latent trait of orienting 
ability. 

In order to guard against such errors we 
have to have some way of showing that a 
test measures the same psychological trait 
in different groups. If we are dealing with a 
single test, and are concerned that different 
problems within the test tap different abili¬ 
ties in various groups, we look for evidence 
of differential item functioning. 

The idea behind differential item func¬ 
tioning can be illustrated by another hypo¬ 
thetical test, a test of baseball knowledge. 
The more you know about baseball, the 
better able you will be to answer ques¬ 
tions about baseball. The question “How 
many players are there on a team?” can 
be answered by anyone with the slightest 
knowledge of the game; the question “What 
is a squeeze play?” is a bit more challeng¬ 
ing. Many Americans who have not thought 
about baseball since grammar school can 
answer the first question but not the second; 
a sports reporter would be able to answer 
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both. More generally, we should be able to 
order questions in terms of their difficulty. 

Now suppose we give this test to two 
groups of people, Americans and Aus¬ 
tralians. Baseball is played in Australia, so 
some Australians know more about baseball 
than some Americans. The question is “Does 
the test measure the same thing, baseball 
knowledge, in both countries?” If it does, 
then the order of difficulty of items should 
be the same in each country, even though 
Australians, on the average, might answer 
fewer questions correctly. My intuition is 
that the order of item difficulty would be 
the same. 

But let's go back to the direction-finding 
test, and this time the contrast is between 
men and women. Because the test measures 
orienting ability in men and verbal memory 
in women, there is no guarantee that the 
order of item difficulty would be the same. 
Suppose it is not. In that case we say that 
the test shows differential item functioning 
across groups. When differential item func¬ 
tioning is found, conclusions about group 
differences on a trait said to underlie a test 
cannot be drawn from observations of dif¬ 
ferences in test scores. 

The simplest way to test for differential 
item functioning is to calculate the rank- 
order correlation between item difficulties 
within each group. Anything other than 
a fairly high value (.8 or above) is evi¬ 
dence for differential item functioning. If 
the items on a test have been selected using 
Item Response Theory (IRT; see Chapter 4), 
a more sophisticated test is possible. The 
item parameters must be the same in both 
groups. 

COMPARING GROUPS USING TEST 
BATTERIES, SUCH AS THE WAIS 
Differential item functioning refers to items 
within a test. Many studies attempt to draw 
conclusions about groups based on scores 
on a battery consisting of several tests. This 
would be the case, for instance, in a com¬ 
parison of WAIS scores. In general, the pur¬ 
pose of the comparison will not be to com¬ 
pare raw scores, but to compare the groups 
on some underlying factor, such as general 


intelligence [g) f that is derived from the 
scores on the individual tests. In technical 
terms, we want to determine group differ¬ 
ences on a latent trait [g, R in the g-VPR 
model, Gf, Gc, etc.) from observations on 
the manifest test scores. 

An important and deceptively simple- 
seeming case of such a comparison is 
Spearman's hypothesis , which Arthur Jensen 
attributes to the pioneering psychometrician 
Charles Spearman. 12 The conjecture is that 
differences between groups on virtually all 
cognitive tests are largely due to differences 
in g , general intelligence. For instance, sup¬ 
pose that we were to compare the grades 
of African American and White high school 
students in a variety of subjects - mathe¬ 
matics, language use, history, art, and so on. 
According to Spearman’s hypothesis the dif¬ 
ferences between groups could largely be 
accounted for by differences in underlying 
general intelligence. 

Spearman’s hypothesis is a statement 
about group differences on a latent trait, 
g, rather than a difference on observable 
test scores. By definition, latent traits can¬ 
not be observed. So how do we evaluate 
the hypothesis? General intelligence, in the 
sense of g , is operationally defined as a fac¬ 
tor underlying scores on several (observable) 
tests, such as the sub tests of the WAIS. 
Jensen has developed a technique called the 
method of correlated vectors as a way to test 
Spearman's hypothesis. 15 As this method has 
been used widely, and erroneously, in the 
study of group differences, it is worthwhile 
understanding it. This serves as a good intro¬ 
duction to the more difficult task of under¬ 
standing a better method. 

If Spearman's hypothesis is valid, then 
those tests that are good measures of g 
should show large between-group differ¬ 
ences, while tests that are measures of spe¬ 
cial abilities should show smaller differ¬ 
ences. The extent to which a test is a 
measure of g is determined by its loading 

12 Jensen, 1998, p. 372. Jensen credits the hypothesis to 

Spearman, but it appears to be Jensen’s summary of 

Spearman’s ideas on the topic. 

13 Jensen, 1998. 
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on the first (general) factor in a factor anal¬ 
ysis of a test battery (see Chapter 4). The 
extent to which groups differ on each of 
the tests within a battery is determined by 
the d statistic for comparing groups on that 
test. In order to apply the method of cor¬ 
related vectors we compute the correlation 
between the test loadings and the d statistic, 
across tests. If the correlation is high, then 
the hypothesis is confirmed. 

In spite of its popularity, the method 
of correlated vectors is seriously flawed. 
Depending upon the circumstances, it can 
either overstate the size of group differences 
on g when they are small 14 or, in other cir¬ 
cumstances, understate them when they are 
present? 5 This means that we will have to 
rethink many of the conclusions drawn by 
Jensen and others based upon its use. (It 
does not mean that these conclusions are 
wrong! It just means that they have to be 
held in abeyance until further analyses are 
done.) The reasons that the method is wrong 
are rather complicated, and so have been 
presented in Panel 11.2. Fortunately there 
is another method, MultiGroup Confirma¬ 
tory Factor Analysis (MGCFA), that is much 
more satisfactory. Hopefully this method 
will receive wider use in the future. 

This is not just a statistical nicety. There 
are many published conclusions that rest, 
somewhat shakily, on results obtained by 
using the method of correlated vectors. For 
instance, it has been used to make the 
very strong statement that African Ameri¬ 
can and White differences in intelligence are 
due to differences along the g dimension. 16 
The argument was that there is a fairly 
high correlation across subtests between 
(a) the g loading of the subtest and (b) 
the difference between White and African 
American scores on the subtest. At first 
glance, this is an argument for a difference 
between Whites and African Americans on 
g. But see Panel 11.2 for the problems of 
interpretation. 

14 Dolan & Hamaker, 2001; Dolan, Roorda, & 

Wicherts, 2004. 

15 Ashton & Lee, 2005. 

16 Nyborg & Jensen, 2000; Rushton & Jensen, 2005. 


11.2. Aging 

We all age. Binet assumed, and no one 
has disagreed, that cognitive competence 
increases with age up to early adulthood. 
At the other end of life, things are not so 
rosy. The incidence of dementia in peo¬ 
ple sixty-five and older is 10%, six times 
higher than the incidence in the population 
at large. 17 Dementia does not respect emi¬ 
nence. Ronald Reagan, the fortieth president 
of the United States, died with Alzheimer's 
disease at age ninety-three. He is believed 
by many to have suffered from early symp¬ 
toms of the disease during his second term of 
office (1985-89), when he was in his late sev¬ 
enties. However, dementia is certainly not 
an inevitable feature of old age. Konrad Ade¬ 
nauer, the first chancellor of West Germany 
following World War II, is largely credited 
with guiding his country out of chaos and 
into a major place among industrial nations. 
He assumed office at seventy-three and left 
it at eighty-seven. 

Theories of adult lives and aging stress 
how people find progressively more con¬ 
fined niches as they age. This is true in gen¬ 
eral, but there are exceptions. John Glenn 
(1921-) received a commission as a US Marine 
Corps aviator at age twenty-two, and served 
with distinction as a fighter pilot in both 
World War II and the Korean War. Fol¬ 
lowing the Korean War he became a test 
pilot and had a brief interlude as a TV quiz 
show contestant. He was selected in the first 
cohort of American astronauts, and was the 
first American to orbit the Earth. Follow¬ 
ing his retirement from government service 
in 1964 (at age forty-three) he became a 
successful businessman in a field that had 
nothing to do with aviation. He developed 
a strong interest in politics, and served as a 
United States senator from 1974 until 1997. 
When he was seventy-seven he returned to 
space as a crew member on a space shuttle 
mission, thus becoming the oldest person to 
orbit the Earth. 


17 www.freemd.com/senile-dementia/incidence.htm, 
downloaded 18 November 2008. 
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Panel 11.2. The Method of 
Correlated Vectors and the 
Alternative, MGCFA 

I first describe in somewhat more detail 
than in the main text just what the 
method of correlated vectors is. I then 
present the problems that it raises. 

The first step in applying the method is 
to determine that the same relationships 
hold between tests within each group to 
be compared. This is evaluated by a sta¬ 
tistical test called congruence that deter¬ 
mines whether or not the two corre¬ 
lation (or covariance) matrices can be 
assumed to be identical in each group. 
IF they appear to be identical, the con¬ 
gruence condition is said to be satisfied. 
In a nonstatistical sense, what this means 
is that the observable subtest scores are 
related to each other in the same way 
in each group. Consider the hypotheti¬ 
cal direction-taking test described in the 
main text. If my intuitions are correct, 
this test would be highly correlated with 
a test of verbal memory in women, and 
with a test of rotational ability in men. 
The congruence condition would not be 
found. 

Jensen assumed that failure of congru¬ 
ence would be evidence that the tests 
within a battery were measuring different 
things in each population, and therefore 
that Spearman’s hypothesis could not be 
tested. 

Assuming that congruence is satisfied, 
the next step is to factor analyze the com¬ 
bined data, extract a first factor, which is 
the operational definition of g, and corre¬ 
late the factor loadings of each test with 
its associated d statistic, as described in 
the text. 

A Dutch psychometrician, Conor 
Dolan, has pointed out that in order to 
test Spearman’s hypothesis the data must 
satisfy a property called measurement 
invariance. The measurement invariance 
property is that the factor loadings for 
subtests should be identical within each 
group being compared. This ensures that 
two individuals, one from each group, 


who have identical scores on the latent 
traits (which is what we are interested in 
but cannot observe) also have identical 
scores on the test, which we can observe. 
In the examples given in the main text, 
measurement invariance would fail for 
the direction-finding test if it were to be 
incorporated into a test battery given to 
both men and women. 

Dolan and his colleagues showed that 
the method of correlated vectors can pro¬ 
duce positive findings in situations where 
measurement invariance does not hold.* 
Using a separate line of argument, Ash¬ 
ton and Lee have shown that the method 
can produce negative findings (the corre¬ 
lation between loadings and mean differ¬ 
ences can approach zero) in situations in 
which there actually is a difference in g . 1 

Dolan advocated using Multi-Group 
Confirmatory Factor Analysis (MGCFA) 
to evaluate Spearman’s hypothesis and 
similar conjectures. The technique is a 
variety of confirmatory factor analysis, 
the statistical method explained in Chap¬ 
ter 4, section 4.2. MGCFA is prefer¬ 
able to the method of correlated vec¬ 
tors in two ways. One is that it involves 
explicit tests for measurement invari¬ 
ance. The other, which is not negligible, 
is that it allows investigators to com¬ 
pare two hypotheses, rather than eval¬ 
uating the weaker statement that a sin¬ 
gle hypothesis is a better than chance 
description of the data. The method 
of correlated vectors tests Spearman’s 
hypothesis against chance, not against 
competing hypotheses. 

The MGCFA technique has been 
known since the early 1980s. It requires 
relatively large samples from all groups, 
and calls for a good bit of expertise in the 
use of the required computer programs. 
Therefore, it is often not used when per¬ 
haps it should be. Hopefully this situa¬ 
tion will change over time, as the flaws in 
the simpler method of correlated vectors 
become widely known. 

* Dolan & Hamaker, 2001; Dolan, Roorda, & 

Wicherts, 2004. 

1 Ashton & Lee, 2005. 
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The message is straightforward. There are 
changes in cognitive competence over the 
adult years, both upward and downward. 
They are complex. They are also extremely 
important to understand, because age is a 
variable that affects every one of us. 

11.2.1. Issues 

Three different designs have been used in 
the study of aging. The simplest is the cross- 
sectional design, where the investigator takes 
measurements on people in different age 
groups, at roughly the same time. The good 
aspects of cross-sectional designs are that 
they are relatively easy to implement, com¬ 
pared to other alternatives, and that they 
provide a picture of differences in the cog¬ 
nitive competencies of different age groups 
in the population, as it exists at the time of 
testing. This information can be projected 
forward over short time spans. For instance, 
if we know the incidence of senile dementia 
in people seventy years and older as of 2010, 
and we know the number of people in the 
sixty-to-seventy age bracket, it is possible to 
estimate the incidence of senile dementia 
expected in 2020. 

Cross-sectional designs pose problems 
of interpretation. It is difficult to obtain 
samples in which the participants differ 
only on age. For instance, elderly partici¬ 
pants (seventy and over] will report hav¬ 
ing had less education, on the average, 
than younger adults. This reflects changes in 
society, rather than psychological changes 
in the individuals, but it produces an 
inevitable confound between age and edu¬ 
cation effects. The general principle is that 
in cross-sectional designs age effects are con¬ 
founded with cohort [Flynn] effects dis¬ 
cussed in Chapter 10. If we compare, in 
2010, twenty-year-olds to seventy-year-olds, 
we are also comparing people in the 1990 
birth cohort to people in the 1940 cohort. 
There is no way to tell whether the differ¬ 
ence between the twenty- and seventy-year- 
olds is an age effect or a cohort effect. 

Because cross-sectional studies measure 
each individual only once, there is also no 
way to distinguish between gradual and 


sudden changes in cognitive capability. To 
see this, consider two possible physiologi¬ 
cal events: gradual deterioration of the cen¬ 
tral nervous system (CNS] and sudden dam¬ 
age due, for instance, to stroke. Either could 
produce slowed decision-making. In a cross 
sectional design we would find that, on the 
average, decision-making processes slowed 
with age. However, because different people 
would be evaluated at each age, we would 
have no way of knowing how much of the 
difference was due to gradual deterioration, 
which affected everyone, and how much 
was because the older groups would con¬ 
tain more people who had suffered sudden 
CNS damage due to stroke. 

In a longitudinal design the same peo¬ 
ple are studied across several time intervals. 
This makes it possible to observe age-related 
changes within an individual. The con¬ 
trast between cross-sectional and longitudi¬ 
nal results is informative. Intelligence test 
scores show considerably less drop over the 
adult years in longitudinal studies than they 
do in cross-sectional studies. This is at least 
in part due to the confounding of age and 
cohort effects in a cross-sectional study. 

Longitudinal studies are expensive and 
time-consuming. They are subject to strong 
recruitment effects, for people with low test 
scores are more likely to drop out than 
people with high test scores. This is par¬ 
ticularly true if the study involves taking 
multiple measures, thus requiring a consid¬ 
erable investment in time on the part of 
the study participants. Nonrandom attrition 
effects occur. Unless allowance is made for 
this effect, aging may appear to be less debil¬ 
itating than it actually is. 18 

The participants in a longitudinal study 
come from just one cohort. Therefore, the 
effects of general changes in society are 
mixed with the effects of aging. As an exam¬ 
ple, the percentage of women in Terman’s 
study who followed professional careers was 
high for its time, but was much lower than 
the percentage of women following profes¬ 
sional careers in the SMPY study, who were 

18 See Madhyastha et al., 2009, for an example and 

discussion. 


THE DEMOGRAPHY OF INTELLIGENCE 


367 


recruited fifty year later. Was this due to any 
psychological difference between women 
born around 1910 and those born around 
1960, or was it due to the much greater career 
opportunities for young women in the 1980s 
than in the 1930s? 

The gold standard is the cohort- 
sequential design, such as the Seattle Lon¬ 
gitudinal Study described in Panel 9.2, in 
which investigators recruit people of differ¬ 
ent ages at the start, follow them as in a lon¬ 
gitudinal study, and in addition periodically 
recruit new participants and follow them as 
well. This allows for a separate evaluation 
of cohort and aging effects, and provides 
for longitudinal studies of different cohorts. 
However, cohort-sequential designs are very 
difficult to arrange. Aside from the expense, 
the biggest problem is ensuring comparabil¬ 
ity of the samples developed at each phase 
of recruitment. In the Seattle Longitudinal 
Study participants were recruited from peo¬ 
ple enrolled in a health maintenance organi¬ 
zation (HMO). If enrollments in the HMO 
have changed over the now fifty years of the 
study, recently recruited participants will 
not be comparable in all ways to earlier- 
recruited participants. There are also prob¬ 
lems of selective attrition, as is the case in a 
longitudinal design. 

These problems make the study of aging 
difficult. They do not make it impossible. 
The effects of aging will be discussed under 
three general headings: studies using psy¬ 
chometric techniques, studies concentrating 
on information-processing and physiological 
variables, and what turns out to be a very 
important, albeit amorphous, set of findings, 
“other cognitive changes.” 

11.2.2. A Psychometric View of Aging 

Many psychometric studies of changes in 
intelligence with age have utilized Cattell 
and Horn’s distinction between fluid and 
crystallized (Gf vs. Gc) intelligence. Horn 
summarized this work, as of the late twen¬ 
tieth century, by asserting that Gf peaks in 
the mid twenties and then falls fairly rapidly, 
with the fall accelerating in old age (sixty- 
five plus], while Gc peaks around thirty and 


then is maintained at a surprisingly constant 
level until old age. 19 His conclusion has been 
widely accepted. The fact that age has dif¬ 
ferent influences on fluid and crystallized 
intelligence is one of the strongest nonpsy¬ 
chometric arguments for the importance of 
making the Gf-Gc distinction, rather than 
collapsing Gc and Gf into a single construct, 
general intelligence (g). 

In Carroll’s expansion of the Gf-Gc 
model, crystallized and fluid intelligence are 
two of several broad second-stratum mod¬ 
els, below g but less specialized than pri¬ 
mary level abilities, such as vocabulary and 
image rotation. Other important second- 
stratum abilities are visual and auditory rea¬ 
soning, short-term memory, and long-term 
memory retrieval. In a rather-less-cited part 
of his argument, Horn observed that all 
the second-stratum abilities except Gc show 
declines from roughly age thirty onward. 

Horn relied heavily on cross-sectional 
studies to draw these conclusions. Such 
studies are confounded by cohort effects. 
This point is particularly troubling because 
the cohort effect is much stronger for mea¬ 
sures of Gf than for measures of Gc. 2 ° If 
Gf has increased across cohorts, a cross- 
sectional study will show decrements in 
Gf with age. Therefore, the result of an 
abbreviated cross-sequential study of the 
Woodcock-Johnson testing program is of 
considerable interest. 

The Woodcock-Johnson tests, mentioned 
briefly in Chapter 2, are battery-type tests 
generated from the Carroll-Cattell-Horn 
three-stratum model of intelligence. The 
testing program includes tests suitable for a 
wide range of ages, from early childhood to 
adulthood. The test batteries contain mark¬ 
ers for Gf, Gc, and several other second- 
stratum abilities defined by the Cattell-Horn 
model. They also contain a measure of 
“broad cognitive ability,” which is analogous 
to the general intelligence [third-stratum] 
factor in Carroll’s extension of the model. 21 

19 Horn, 1985; see also Horn & Noll, 1994, and Cattell, 

i9 8 7- 

20 Flynn, 1987. 

21 Carroll, 1993. 
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Age at Testing 


Figure 11.2. Broad cognitive ability (BCA) factor scores as a 
function of age. The lines indicate scores at the two points of time 
of testing. Heavy lines show quartiles. From McArdle et al., 2002, 
Figure 8, reprinted with permission. 


Because the different subtests have been cal¬ 
ibrated using item response theory (Chap¬ 
ter 2], they provide a score that can be 
treated as a linear scale of the trait under¬ 
lying each subtest. This makes it possible to 
talk about differences in the rate of decline 
of various measures of intelligence. 22 

The Woodcock-Johnson test was revised 
and renormed in 1990, to produce the WJ-R 
test. 23 The renorming was based on an 
(approximate] probability sample of the US 
population, containing about 6,500 cases. 
Subsequently J. J. McArdle, a professor at 
the University of Virginia, and his colleagues 
contacted and retested just under 1,200 of the 
original participants. 24 Chronological ages at 
the time of first testing ranged from two to 
ninety years. The interval between testing 
times ranged from one to five years. Fig¬ 
ure 11.2 presents the raw scores for broad 
cognitive ability (gj as a function of age. 
There is a peak of intelligence in the early 
twenties, and a gradual decline thereafter. 
However, there are considerable individ¬ 
ual differences. Some participants in their 

22 Hunt, 2007, Chapter 10. 

23 McGrew, Werder, & Woodcock, 1991. 

24 McArdle et al., 2002. 


seventies have scores that would be consid¬ 
ered quite high for a twenty-five-year-old. 

Different aspects of intelligence peak and 
decline at different rates. This is shown in 
Figure 11.3, which presents changes in Gf 
and Gc as a function of age. The two graphs 
clearly follow different trajectories of change 
over age. This point is reinforced by some 
numbers. Table 11.1 shows the peak age and 
the rate of decline (in a transformation of 
2 scores, a point that need not concern us 
here]. All the second-stratum broad abilities 
except Gc behave very much like Gf, crest¬ 
ing in the mid-twenties and then declining 
throughout the adult years. The exception, 
Gc, is important because Gc is a better pre¬ 
dictor of workforce performance than Gf. 

McCardle and colleagues’ data clearly 
provides support for the three-stratum 
model as an appropriate way to understand 
the influence of age on intelligence. It is 
worth noting, though, that the effects in 
their data are not nearly as striking as the 
description of age-related declines presented 
in some discussions. This can be seen by 
contrasting Figures 11.2 and 11.3 with Horn’s 
stylized presentation of the data, shown in 
Figure 11.4. Gc and Gf do follow different 
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Figure 11.3. Age and change scores for Gf and Gc. Gf peaks sooner than Gc, but falls off more sharply. 
From McArdle et al., 2002, Figure 9, reprinted with permission. 


courses over the life span, but the difference 
does not approach that shown in the stylized 
summary. 

Schaie has reported data that comple¬ 
ments the findings of McCardle and col¬ 
leagues. Figure 11.5, based on data from 
the Seattle study, shows the cumulative 
age changes, from age twenty-five onward, 
for three different factors: verbal meaning, 
inductive reasoning, and general intellectual 
ability. These factors were chosen because 
an examination of the subtests on which 
they were based suggests that they should 


resemble the Gc, Gf, and “broad cognitive 
ability’' (g) scales in the WJ-R battery. Com¬ 
paring Figure 11.5 to Figures 11.2,11.3, and 11.4, 
we see that the two studies are in agreement 
in showing that verbal/Gc measures are sub¬ 
stantially less resistant to age than measures 
of inductive reasoning/Gf. 

The studies disagree somewhat about 
precisely where the declines come. The 
McArdle study indicates declines starting 
in the mid-twenties for Gf and in the 
mid-thirties for Gc. The Seattle Longitu¬ 
dinal Study data shows declines starting 


Table 11.1. Ages at peak value and rate of decline (in “W units/' 
which are linearly related to the IRT logit scale) for various 
factors evaluated on the WJ-R test 


Rate of Change 
Age at Highest per Year, 

WJ-R Composite Value Ages 20-75 


Fluid reasoning (Gf) 

22.8 

“•5 

Comprehension and knowledge (Gc) 

35.6 

—.01 

Retrieval from long-term memory 

18.1 

“■4 

Short-term memory 

24.2 

“■3 

Processing speed 

25.! 

-.6 

Auditory processing 

22.7 

“•4 

Visual processing 

24-5 

-•4 

Broad quantitative ability 

29.0 

-•3 

Broad academic knowledge 

29.8 

“•3 

Broad cognitive ability (g) 

26.2 

-•3 


Source: Data from McArdle et al., 2002, Table 9, reprinted with 
permission. 
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Age, in Years 

Figure 11.4. A stylized picture of the age trends for different broad 
second-level traits in the Gc-Gf model of intelligence. While the 
contrast between the Gf and Gc trends is maintained in Figure 11.3, 
the contrast is much less in the data than in the stylized depiction. 
From Horn, 1986, p. 52. 


considerably later. The truth may lie some¬ 
where in between. Age trends in the WJ- 
R renorming were confounded with cohort 
effects, which would lead to an overestima¬ 
tion of the deleterious effects of aging. In 
the Seattle Longitudinal Study participants 
have dropped out over time, so there is a 
bias toward continued participation by peo¬ 
ple with higher initial abilities. This would 
result in an understatement of the age effect. 

There has been continuous debate about 
the extent to which the pervasiveness of 
the general intelligence factor increases with 
advanced age. This is called the dedifferentia¬ 
tion hypothesis. The hypothesis predicts that 
the correlations between measures of differ¬ 
ent types of intelligence, such as verbal and 
spatial-visual reasoning, will increase with 
age. 

Powerful arguments can be advanced for 
the dedifferentiation hypothesis. Declines 
in intelligence are associated with declines 
in health. Injuries that influence the brain 
and nervous system should, on logical 
grounds alone, have widespread deleteri¬ 
ous effects on cognitive performance. Non- 
pathological reductions in the prefrontal 
cortex and other areas associated with the 


working memory-attention-executive func¬ 
tion complex, which are typical of aging, 
are associated with lower fluid intelligence 
scores. 25 Reduction in the capabilities of the 
working memory-attention complex should 
have a pervasive effect on almost all cog¬ 
nitive functions, which should lead to an 
increased influence of individual differences 
in g upon performance. 

Despite this argument, the evidence for 
the dedifferentiation hypothesis is mixed. 
There is some support for an increase in cor¬ 
relations between cognitive variables with 
age in the Seattle Longitudinal Study. 26 
However, inspection of the quartile lines in 
Figures 11.2 and 11.3 shows that there was lit¬ 
tle increase in the variance of g with age in 
the WJ-R sample. Other detailed multivari¬ 
ate studies of psychometric cognition and 
aging have produced evidence both for and 
against the dedifferentiation hypothesis. 27 

Psychometric studies of aging present a 
consistent pattern. Crystallized intelligence 

25 See, for instance, Raz et al., 2008. A general review 

is provided by Kramer, Fabiani, & Colcomb, 2006. 

26 Schaie, 2005, pp. 213-215. 

27 de Frias et al., 2007; Juan-Espinosa et al., 2002; 

Tucker-D rob & Salthouse, 2008. 
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Figure 11.5. Cumulative age changes from age twenty-five for three 
different aspects of intelligence. Scores are shown in T units, where 
10 is the variance at age twenty-five. Data from the Seattle 
Longitudinal Study (Schaie, 2005), Table 5.1. The “intellectual 
ability” scale is not an average of the other two. It includes measures 
of spatial reasoning and numerical skill, both of which show 
substantial decrements past middle age. 


increases to a peak in early middle age, 
and remains constant throughout the adult 
working years. Fluid intelligence peaks 
somewhere in the young adult years and 
then, on the average, declines through the 
latter half of the working years and old age. 
However, some people have high fluid intel¬ 
ligence test scores well into late adulthood. 

11.2.3. Changes in Information-processing 
Capacity with Age 

The picture with respect to changes in 
information processing with age is quite 
clear. Cognition gets slower, and the work¬ 
ing memory-attention-executive function 
complex does not function as well as it 
used to. 

Cognitive slowing can be demonstrated 
using either psychometric or laboratory 
techniques. In psychometric studies of cog¬ 
nitive skills the examinee is asked to do 
something very simple, quickly, such as 
crossing out the "a’s” in a text. Labora¬ 
tory studies are a bit more complex. Choice 
reaction time and perceptual decision tasks 
(Chapter 6) are favorites. All methods 
demonstrate marked slowing with age. For 
example, in the three-stratum model per¬ 
ceptual speed is a second-level ability, on a 


par with Gf and Gc. In the WJ-R study pro¬ 
cessing speed declined with age more rapidly 
than any other second-level ability, includ¬ 
ing Gf (Table 11.1}. A large-scale survey of 
the UK population found marked slowing 
of both simple and choice reaction times 
with age, with choice reaction time being 
the more sensitive variable. 28 

There is a regular relationship between 
the time that young and old individuals take 
to accomplish a laboratory task such as a 
reaction time task. The time that young 
adults take to complete a task increases with 
task complexity, and the time that older 
people take to complete the same task is a 
multiplicative function of the time that the 
young adults take. The nature of the rela¬ 
tionship is shown in Figure 11.6, for over 175 
tasks and for multipliers ranging from 1 to 2. 
Algebraically, the relationship can be sum¬ 
marized by 

L()ld — &Ly 0Un g, (U'Z) 

where L is the latency of the task and a is a 
multiplier greater than 1. 29 The greater the 
age disparity, the greater the multiplier. 

28 Der & Deary, 2006. 

29 Cerella, 1990. 
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Reaction time of younger group (seconds) 

Figure 11.6. Reaction times lengthen with age. A 
summary of over 175 studies in which older and 
younger groups were compared on the same 
reaction time task. The time in seconds required 
by the older group is shown on the ordinate; the 
time required by the younger group is shown on 
the abscissa. From Cerella, 1985, Figure 1D, 
reprinted with permission. 

While the fact of slowing is clear, both 
the reason for and the implications of slow¬ 
ing are less clear. It has been suggested that 
the reason for slowing is that as people age 
there is a decrease in the reliability of infor¬ 
mation transmission in the neural system. 30 
Trying to pinpoint just why this should hap¬ 
pen would lead us into a discussion of age 
effects on the brain, which is not our major 
concern. The implications of slowing are of 
concern. 

Two models have been proposed. 31 Cog¬ 
nitive slowing might cause problems in other 
aspects of thought. For example, executive 
functioning, allocating attention to concur¬ 
rent or nearly concurrent tasks (like driv¬ 
ing a car while listening to the radio], is 
possible only if information can be evalu¬ 
ated as it comes in. A person whose cogni¬ 
tive processing is slowed is simply less able 
to keep up with the world. This can cer¬ 
tainly be demonstrated in laboratory tasks. 
Whether it is a factor in familiar situ¬ 
ations, in which people have developed 

30 For example, see the model proposed by Myerson 

et al., 1990. 

31 Hartley, 2006; Salthouse, 1996. 


strategies for dealing with relevant informa¬ 
tion processing, is another issue. Adults over 
fifty regularly play card games, and often 
play them quite well. On the other hand, 
they do get into accidents because they do 
not react quickly enough to an emerging sit¬ 
uation. 

In addition to whatever direct effects 
slowed cognitive processing has, slowing 
also serves as a marker for the general state 
of the nervous system. Therefore, slowed 
processing with age is important both in 
itself and as an indicator of general cogni¬ 
tive health. 

Whichever of the two models is correct, 
it is clear that slowing is a major but not 
the entire cause of declines in intelligence 
with age. Figure 11.7 presents an idealization 
of results from a Swedish study in which 
changes in a general cognitive factor (g) were 
determined with and without statistical con¬ 
trols for changes in perceptual processing 
speed. Controlling for processing speed on 
perceptual tasks, such as figure identifica¬ 
tion, virtually eliminates the change in cog¬ 
nition from age fifty to sixty-five, and slows, 
but does not eliminate, the decline past 
sixty-five. 32 

As people age there is a breakdown in 
the working memory-control of attention- 
executive processing complex. This is shown 
by decreased performance on short-term 
memory tasks, greater susceptibility to inter¬ 
ruption, and poorer control on dual tasks, 
such as simultaneously monitoring a stream 
of visual and auditory signals. 33 These behav¬ 
ioral measures are, in general (a) those that 
are most closely related to psychometric 
measures of g and Gf and [b] those that are 
supported by the frontal cortex-cingulate 
cortex-parietal cortex complex. 

These results strongly suggest that the 
decline in the ability to deal with new prob¬ 
lems (Gf) is caused by a reduced ability to 
construct and manipulate problem repre¬ 
sentations in working memory. The deficit 
is probably due to reduced functioning of 

32 Finkel & Pedersen, 2004. 

33 Hoyer and Verhaeghen, 2006; Kramer, Fabiani, & 

Colcombe, 2006; Salthouse et al., 2003. 
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Initial measure 
Controlled for speed 


Figure 11.7. Changes in the level of a generalized cognitive trait (g) 
over the second half of the life span. The measurement of the latent 
trait has been set to a mean of 30 and a standard deviation of 10 to 
avoid negative numbers. The change is shown with and without 
statistical control for speed on perceptual tasks. Data from Finkel & 
Pedersen, 2004, Table 3. 


the frontal-parietal system, which shows 
anatomical deterioration with age. 34 The 
failure in functioning appears to be general, 
rather than associated with a particular part 
of the working memory-control of attention 
complex. 35 

11.2.4. P rac ti ca l Knowledge, Experience, 
and Wisdom 

Given what has been presented so far, the 
reader may well have decided that some¬ 
where around age forty-five people become 
slow, easily distracted, poorly focusing dun¬ 
derheads. But how is it that men and women 
in their fifties and sixties occupy most of the 
important leadership positions in our soci¬ 
ety? Why are people advised to go to expe¬ 
rienced physicians when physicians do not 
become experienced until their forties? Why 
do we routinely trust our lives to commer¬ 
cial airline captains, most of whom are in 
their fifties and sixties? 

Experience counts... a lot. The typical 
test of crystallized intelligence (Gc) evalu¬ 
ates how well a person knows his or her soci¬ 
ety. The same Swedish longitudinal study 
that documented a drop in general cogni¬ 
tive functioning from age fifty onward found 


increases from age fifty to seventy-five in the 
extent to which people could answer general 
information questions. 36 

This finding is typical of many showing 
that Gc is either stable or increases until 
great age. The same thing is true of more 
pointed measures of functioning in life. In 
the Seattle Longitudinal Study participants 
were asked a number of questions about 
what the investigators refer to as “basic living 
skills.” These varied from questions about 
personal finances to questions about how to 
cope with failures in household appliances. 
Figure 11.8 shows the results. There was very 
little decline in basic living skills until age 
sixty-five. This result is typical of several 
other studies that have investigated practical 
intelligence in the elderly. 

Although they may have trouble dealing 
with the sorts of novel, out-of-context prob¬ 
lems that populate IQ tests (and especially 
tests of Gf), people in midlife and beyond do 
quite well in dealing with realistic decision 
making. 37 On the other hand, the elderly do 
not do particularly well in made-up decision 
situations, especially when they are under 
time pressure. This could be due in part to 
general cognitive slowing, in part to the need 
to keep several factors in mind when making 


34 Kramer, Fabiani, & Colcombe, 2006; Raz et al., 2008. 36 Finkel & Pedersen, 2004. 

35 Salthouse, 1996, 2005. 37 Marsiske & Margrett, 2006; Sternberg, 2003a. 
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Figure 11.8. Scores on a test of basic living skills, measured on two 
different occasions. There is very little decline in basic living skills 
until past age sixty-five. Data from Schaie, 1996. 


a complex decision, and in part to older indi¬ 
viduals' tendency to stress accuracy rather 
than speed in decision-making situations. 

Why is there a difference? Laboratory 
decision-making tasks are constructed to 
reveal the process of decision making, in 
situations where all participants have little 
prior experience with the problem at hand. 
Outside the laboratory, decision making is 
heavily influenced by the possession of rele¬ 
vant information. 38 People use the heuristics 
that are “condemned” in the laboratory, and 
they work 4 9 The information-processing 
burden shifts from working memory to long¬ 
term memory and retrieval processes. The 
latter processes are relatively immune to 
age-related decline until quite late in life. 

The heuristics used outside the labora¬ 
tory rely upon the decision maker's having 
experience with similar decision-making sit¬ 
uations - which is precisely what mature 
decision makers have. This sort of reasoning 
has been extended to what some have called 
“wisdom.” The intuitive idea is that as you 
age you acquire wisdom - or at least, you 
should. But how do we distinguish wisdom 
from practical problem solving? 

The theme that I have extracted from 
the literature is that practical problem 
solving and decision making involve prob¬ 
lems that the individual faces in the here 
and now, while wisdom refers to thinking 
about broader, more ephemeral issues, often 
involving the course of society. For instance, 


a group of German researchers have devel¬ 
oped the Berlin Wisdom Theory, in which 
wisdom is defined as “knowing the rules 
and meaning of life.” 40 Sternberg has defined 
wisdom as knowing how to apply creativity 
and intelligence for the common good, by 
balancing personal and societal interests. 41 
Gerontologists claim that wisdom is more 
typical of the old than the young. That may 
be, but I would like to have clearer defini¬ 
tions and a better analysis of individual dif¬ 
ferences within the aged population before 
I would unequivocally associate age with 
wisdom. 

11.2.5. ^ Summary Comment on Aging, 
and a Remark on Healthy Aging 

The study of changes in intelligence with age 
has produced four well-established findings. 
They are: 

1. Fluid intelligence decreases over the 
adult life span, while crystallized intel¬ 
ligence remains stable or even increases 
slightly until people are into the retire¬ 
ment years. 

2. There is a generalized slowing of cogni¬ 
tion with age. It is measurable as early 
as the forties, and can influence men¬ 
tal competencies outside of the labora¬ 
tory as we approach the seventies and 
beyond. The slowing is pervasive. 

3. There is a similar drop in the function¬ 
ing of the working memory-attention 


38 Gigerenzer, 2000; Klein, 1998, 2009. 

39 Klein, 2009. 


40 Brugman, 2006. 

41 Sternberg, 2003a. 









THE DEMOGRAPHY OF INTELLIGENCE 


375 


control-executive functioning system 
over the life span. As is the case with 
general slowing of reactions, the degree 
of deficit accelerates past the mid¬ 
sixties. 

4. Laboratory studies underestimate the 
effectiveness of older individuals when 
they are faced with familiar problems, 
especially in situations in which they 
have developed expertise. In such situa¬ 
tions people in their forties and beyond, 
even into their sixties, may actually out¬ 
perform younger individuals. 

It is important not to romanticize the 
last finding. It refers to the performance of 
individuals who have experienced “healthy 
aging.” While, as far as I can find out, this 
term has never been defined precisely, it 
clearly refers to two things. 

First, the individual must not have 
experienced serious health problems that 
impact on cognition. These include, but 
are not limited to, cardiovascular problems 
and dementia-producing diseases, such as 
Alzheimer’s disease and the later stages 
of Parkinson’s disease. The dementias may 
unfold gradually, so any population of 
apparently healthy elderly people (arbitrar¬ 
ily, anyone over sixty) will include some pre¬ 
dementia cases. As the population ages the 
dementias will develop to the point that the 
individuals cannot live on their own. The 
high incidence of dementia in the elderly 
poses a major public health problem. 

Second, in order to stay healthy the 
individual must remain engaged with soci¬ 
ety. Intelligence, like physical fitness, does 
not exist in a vacuum. It requires main¬ 
tenance. Studies of healthy young adults 42 
have shown that people vary in intellec¬ 
tual engagement, and that those who engage 
with the world are, in general, more intelli¬ 
gent than those who do not, where intelli¬ 
gence is measured by psychometric and lab¬ 
oratory studies. Whether engagement causes 
intelligence, or intelligence causes engage¬ 
ment, is hard to say. There are probably 
reciprocal influences. 

42 Ackerman, 1996; Ackerman & Beier, 2001. 


Longitudinal studies of the elderly, such 
as the Seattle Longitudinal Study and the 
Swedish study of aging twins, 45 have shown 
that this trend is more pronounced as 
we grow older. Withdrawal and height¬ 
ened anxiety are statistically associated with 
lower intelligence. This becomes a seri¬ 
ous problem as people age, because the 
frequency of threatening life events will 
increase with advancing age. Loss of a job 
is far more threatening to a person of fifty 
or sixty than to one of twenty or thirty. 
Loss of a spouse becomes more likely with 
advancing age. How we handle such events 
may both tell a good deal about someone’s 
present intelligence - in the conceptual 
rather than in the narrower psychometric 
sense - and have implications for mainte¬ 
nance of intelligence following the trauma. 

11.3. Male-Female Differences 
in Intelligence 

Are men smarter than women? Or is it the 
other way around? It depends on whom you 
ask, and how the question is framed. 

Historically it has been assumed that 
men are generally more competent than 
women. This belief has been so widespread 
that some powerful women have felt that 
they should downplay either their abil¬ 
ity or their femininity. Panel 11.3 presents 
some political examples that have stuck 
in my mind. Similar disparities are seen 
in other fields of accomplishment. When 
Charles Murray compiled a statistical analy¬ 
sis of major contributions in various fields he 
found the ratio of eminent men to eminent 
women ranged from 50:1 (mathematics) to 
10:1 (literature). 44 

Historically, there have been entrenched 
rules limiting women’s participation out¬ 
side of home and hearth. In today’s 
post-industrial society legal discrimination 
against women's participation in various 
social roles has been very much reduced. 
Informal social rules and attitudes dictating 

43 Schaie, 1996, 2005; Wetherell et al., 2002. 

44 Murray, 2003. 
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Panel 11.3. Historic Examples and 
Historic Commentaries 

Here are a few historic examples of 
accomplished women who have either 
hidden or downplayed their intellectual 
characteristics. 

Hatshepsut (fifteenth century BCE), 
one of the very few women pharaohs, had 
herself depicted with a false beard, proba¬ 
bly to preserve an image of a ruler. Some 
two millennia later Elizabeth I of Eng¬ 
land, one of the most powerful women 
who ever lived, famously described her¬ 
self to her troops as “weak and feeble”: 

l know 1 have the body of a weak and 
feeble woman . . . 

Elizabeth l, speech to English 
army assembled to defend against a 
Spanish invasion, 1588* 

Actually, Elizabeth was not all that 
uncertain about herself. It is not clear 
whether she actually spoke the words 
just quoted, or whether this quotation 
was made up by her advisors, in an early 
effort at public relations spin. In a better- 
documented speech to Parliament she 
had said 

/ thank God that I am endued with such 
qualities that if l were turned out of the 
Realm in my petticoat 1 were able to live 
in any place in Christendom . 

Elizabeth I, speech to 
Parliament , 1566* 

Given her career, I suspect that the 
second quote, rather than the first, bet¬ 
ter reflected Elizabeth I’s self-image. The 
interesting thing is that either she felt it 
necessary to assert a “weakness of fem¬ 
ininity” in the first quote, or, if she did 
not actually say this (and the historical 
record is weak here), it was necessary for 


her supporters, quoting her speech, to say 
that she said it. 

The image of weak-willed femininity 
was still held in England two hundred 
years later, when Mary Wollstonecraft 
presented the first known argument for 
full female participation as movers and 
shakers in society. The second chapter of 
her book was titled “The Prevailing Opin¬ 
ion of a Sexual Character Discussed.” It 
began 

Many ingenious arguments have been 
brought fonvard to prove , that the two 
sexes, in the acquirement of virtue, 
ought to aim at attaining a very dif¬ 
ferent character: or, to speak explicitly, 
women are not allowed to have suffi¬ 
cient strength of mind to acquire what 
really deserves the name of inrtue. 

- Mary Wollstonecraft, opening 
lines of Chapter 11 of A Vindication 
of the Rights of Women . . . (1792J 

Wollstonecraft was certainly right 
about what people thought. Five years 
earlier the framers of the Constitution of 
the United States (1787), considered an 
extremely progressive political manifesto 
in its time, denied women the right to 
vote. 

On the other hand, at the time 
Wollstonecraft was writing Catherine II 
(“the Great,” 1729-96, ruled 1762-96) was 
Empress of all the Russias, and an auto¬ 
crat’s autocrat. She followed Elizabeth of 
Russia (1709-62, ruled 1741-62), who was 
a powerful figure herself. 

Conflicts between women’s actual 
accomplishments and widespread beliefs 
about the relative abilities of men and 
women are not just a modern phe¬ 
nomenon. 

* Shapiro, 2006. 
t Ibid. 
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differences in women’s roles vary substan¬ 
tially across countries and across social 
groups within countries. Nevertheless, no 
modern society, and probably no past soci¬ 
ety, assigns exactly the same roles to 
men and women. The form of differenti¬ 
ation varies greatly, ranging from the sub¬ 
stantial egalitarianism of modern Scandi¬ 
navia to the much more rigid assignments 
of gender rules in conservative Islamic 
states. 

Although societies vary in what they see 
as appropriate roles for men and women, 
there is one universal. Childbearing is dic¬ 
tated by biology. Child care is almost exclu¬ 
sively a female role, especially for younger 
children. This gender-differentiated role 
extends beyond caring for one's own chil¬ 
dren. Nursery, pre-school, and elementary 
school teachers are predominantly, although 
not exclusively, women. 

Outside of the child-care role, assign¬ 
ments in modern industrial society are more 
equitable than they have been in past soci¬ 
eties, but the extent of the gender differen¬ 
tiation varies over time and place. Men pre¬ 
dominate in “traditionally masculine” fields, 
such as firefighting, police work, mechan¬ 
ical trades, military service, and aviation. 
There are more women than men in sec¬ 
retarial trades, nursing, and different vari¬ 
eties of interpersonal counseling. Some of 
these distributions of roles have seen rapid 
changes in the past fifty years. Medicine, for 
instance, was once almost exclusively a male 
occupation. Today women make up slightly 
over half the graduating classes from Amer¬ 
ican medical schools. 

The clergy provide an interesting case. 
Before 1950 women clergy were virtually 
unknown in the Catholic, Protestant, and 
Jewish faiths. Today women regularly serve 
as clergy in Protestant and Jewish commu¬ 
nities. The Catholic Church forbids women 
to serve as priests. This distinction is inter¬ 
esting because the duties of the clergy 
are entirely intellectual and interpersonal. 
Strength, speed afoot, and the ability to ori¬ 
ent in space, all characteristics that display 
strong sex differences, are irrelevant to the 
duties of a church leader. 


Outside of child rearing, why should sex- 
role differentiation be so pervasive? Sto¬ 
ries (with only indirect substantiation) about 
“man the hunter and woman the gleaner" 
present a plausible case for evolutionary dif¬ 
ferences in sex roles, based on men’s supe¬ 
rior size and strength. But why should these 
practices persist in modern society, where 
computer programming is far more impor¬ 
tant than hunting skill? Could it be that 
the differentiation is possibly due to male- 
female differences in cognition? Or are they 
due to social inertia, because we are still 
stuck with Neolithic thinking about what 
women can do? 

These questions do not have easy 
answers. Diane Halpern, a professor at the 
Claremont Graduate Schools and a well- 
respected reviewer of the modern literature 
in the field, had this to say: 

It seemed like a simple task when I started 
writing this book.. . . At the time it seemed 
to me that any between-sex differences 
in thinking abilities were due to social¬ 
ization practices, artifacts and mistakes 
in the research, and bias and prejudice . 
After reviewing a pile of journal articles 
that stood several feet high, and numer¬ 
ous books and chapters that dwarfed the 
stack of the journal articles, 1 changed my 
mind. The task I had undertaken certainly 
wasn't simple and the conclusions that I 
had expected to make had to be revised. 

Halpern, 1986, Introduction 

The next few sections show why Halpern 
found the topic so complex. 

11.3.1. General Intelligence I: The Evidence 
from Studies of Battery-type Tests 

The case for differences between men and 
women in general intelligence, or g, rests 
on three pieces of evidence: the results 
from overall scores derived from battery- 
type tests, such as the IQ score derived from 
the Wechsler tests; the results from factorial 
studies in which individual scores are com¬ 
puted for the g factor derived from a vari¬ 
ety of test batteries, including both avowed 
intelligence tests and batteries constructed 
for research purposes; and the results from 
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studies of individual tests, such as the Raven 
Matrices tests, that purport to be measures 
of g. We will look first at the results from 
the battery-type tests, and then examine 
the results from the individual tests and 
some issues involving tests used for person¬ 
nel screening. 

Analyses of the adult standardization 
samples of the WAIS-III and WAIS-R gener¬ 
ally show a small difference in IQ in favor of 
men. The results are consistent across coun¬ 
tries, running from two to three IQ points 
in the United States and Canada 45 (in devi¬ 
ation units, d = .19] to four points ( d = .27) 
in China and Japan. 46 These results are also 
close to the results obtained in earlier stud¬ 
ies, showing consistency in time. 47 There is 
a somewhat similar picture when we look 
at children’s data. IQ differences are on the 
order of one to two points in favor of boys 
in both the US and the Netherlands. 48 

These results are not exclusive to the 
Wechsler tests. Deary and his colleagues 
report an elegant example, involving data 
from the NLSY79, described in panel 9 - 9 - 49 
The data is of interest because the motiva¬ 
tion for constructing the ASVAB, predic¬ 
tion of performance in the armed services, 
is rather different than the motivation for 
constructing the Wechsler tests, which were 
designed for use by clinical and educational 
psychologists. 

Deary and colleagues contrasted the 
scores of brothers and sisters of the sev¬ 
enteen to twenty-three-year-olds who had 
taken the test, thus controlling for family 
background. The male-female difference in 
deviation scores on the AFQT, the gen¬ 
eral score derived from the ASVAB and a 
good measure of Gc, was —.02, a small, and 
not statistically reliable, finding in favor of 
females. 

For the final piece of data, we look at 
a previously referenced, very large survey 
of over 70,000 schoolchildren in the United 

45 Longman, Sakofske, & Fung, 2007. 

46 Dai et al., 1991; Hattori & Lynn, 1997. 

47 Matarazzo et al., 1986, Snow & Weinstock, 1990. 

48 Born & Lynn, 1994. 

49 Deary, Irwing, et al., 2007. 


Kingdom, who took the Cognitive Abilities 
Test battery at age eleven. 50 Two factors 
were derived from this test, a general intelli¬ 
gence factor and a residual verbal factor. No 
male-female differences were found on the 
general factor. There was a slight advantage 
for girls on the residual verbal factor. 

What this review shows is either no dif¬ 
ference or a very small difference in gen¬ 
eral intelligence in favor of males. Richard 
Lynn has argued that there is actually a 
greater male advantage in intelligence than 
the tests reveal. 51 His argument is based on 
two claims. 

Lynn’s first argument is that girls mature 
more rapidly than boys, and that cogni¬ 
tive competence increases with physiolog¬ 
ical age, rather than with calendar age. The 
male-female difference might be small, and 
even negative (reflecting a female advan- 
tage) prior to puberty, but a male advantage 
would appear after adolescence and con¬ 
tinue throughout adulthood. 

Lynn is correct that male-female differ¬ 
ences are smaller in childhood than in adult¬ 
hood, although not very much smaller. It is 
not clear whether this should be regarded 
as an artifact or simply an observation. By 
analogy, male-female discrepancies in height 
are smaller (and can show a female advan¬ 
tage] in childhood and not in adulthood, but 
this is not an artifact of the way we mea¬ 
sure height; girls do get closer to their adult 
height than boys do in their pre-teen and 
early teen years. To the extent that cogni¬ 
tive growth mirrors physical growth, one 
could argue that girls are, in fact, smarter 
than boys during the elementary and mid¬ 
dle school years, a point that should be 
taken into account in situations where entry 
into higher levels of education is deter¬ 
mined by performance in the elementary 
years. 

Lynn’s second argument is that at all ages 
the tests are biased against men. He states, 

The adult male advantage of around 4 

IQ points obtained by averaging the ver¬ 
bal comprehension, reasoning and spatial 

50 Deary, Strand, Smith, & Fernandez, 2007. 

51 Lynn, 1999. 
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Table 11.2. Selected scores and male-female comparison for 
brother-sister pairs in the NLSY79 data set 


Effect Size 
in d Units 


Test 

Male Mean 

Female Mean 

(Male-Female) 

Word knowledge 

22.3 

22.9 

—.07 

Paragraph comprehension 

9.2 

10.0 

—.21 

Arithmetic reasoning 

1 5-9 

H -7 

+.17 

Mathematics knowledge 

11.9 

11.9 

0.00 

AFQT standardized score 

-.034 

+ -°34 

—.02 

AFQT g score 

15.08 

14.68 

.06 


Source: Data extracted from Deary et al. # 200 7, Table 1, with permission 
from Elsevier. 


abilities is not generally found in the full 
scale IQ of the Wechsler tests or in the over¬ 
all IQ of similar tests because the spatial 
abilities are typically under-represented in 
these tests . 

Lynn , 1999, p. 2. 

To what extent is this argument plau¬ 
sible? Summary scores, such as the widely 
used WAIS Full Scale, Verbal, and Perfor¬ 
mance (FSIQ, VIQ, PIQ) scores are deter¬ 
mined by a weighted combination of scores 
on subtests. If men have higher scores on 
some subtests, and women on other sub¬ 
tests, then depending upon the weights 
assigned to each subtest you could produce 
a summary score that favored men over 
women or vice versa, simply by manipu¬ 
lating the weights assigned to the sub tests. 
And it is certainly true that if a test bat¬ 
tery omits an important ability on which 
there are male-female differences, then the 
balance of men and women’s scores in an 
overall index will be different than it would 
have been had the omitted ability been 
evaluated. 

Arthur Jensen has argued that the way 
out of this dilemma is to compare men’s and 
women's measurements (factor scores) on g, 
defined as the primary factor extracted from 
a battery of tests of different aspects of intel¬ 
ligence. The argument is that the weighting 
of individual subtests will then be done by 
rational analysis of the data, rather than by 


using weights that were arbitrarily assigned 
to the sub tests. 52 

The two approaches can lead to differ¬ 
ences in the calculation of overall male- 
female differences. This is illustrated by 
Deary and colleagues' analysis of the 
NLSY79 data. Table 11.2 presents their data 
separately for male-female differences on 
the four subscales of the ASVAB used to 
compute the AFQT. As was mentioned ear¬ 
lier, the AFQT score showed essentially 
no difference between men and women. 55 
However, when Deary and his colleagues 
applied Jensen’s techniques, and computed 
male-female differences on a g index derived 
from factor analysis, there was a male advan¬ 
tage of .06 standard deviation units - not a 
lot, but still a reversal of the direction com¬ 
puted from the composite AFQT score. 

Several researchers have conducted simi¬ 
lar analyses of the WAIS and similar batter¬ 
ies. Some have found evidence for a small 
advantage in g for men. Most of these have 
relied on the method of correlated vectors, 
which, as was explained in section 11.1, is 
a questionable technique. Studies using the 
better-justified MGFCA procedure find no 

52 See Chapter 4 for a detailed discussion of the logic 
of this approach. 

53 The effect of composite scores upon an overall 
index, and upon the deviation scores of the index, 
is not determined solely by the weights of each 
component. The variance of the components and 
the correlation between the components are also 
important. 
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difference, 54 but these studies can be criti¬ 
cized for their low statistical power! 55 

If there are any male-female differences 
in general intelligence indices derived from 
the commonly used battery-type tests, those 
differences are quite small. 

No statistical analysis can address a 
stronger form of Lynn's argument, that 
appropriate subtests are not included at all. 
Women generally do better than men on 
verbal tests, and men may do markedly bet¬ 
ter than women on visual-spatial tests, espe¬ 
cially those tests evaluating the R (rota¬ 
tional] aspect of visual-spatial reasoning in 
the g-VPR model. But does this mean that 
those batteries that are now widely used, 
such as the WAIS, “underrepresent” visual- 
spatial reasoning, as the quotation from 
Lynn implies? 

To answer this question one would have 
to know what the proper representation of 
verbal, visual-spatial, and other traits is. 

If the test battery as a whole is to be vali¬ 
dated by its ability to predict performance in 
an applied setting, then the appropriateness 
of adding or subtracting a particular subtest 
can be determined by seeing if the addition 
improves accuracy of prediction. A power¬ 
ful case can be made that adding spatial- 
visual tests might do this in some situations, 
but might not in others. 56 For that matter, 
there are some situations in which an argu¬ 
ment can be made for ignoring spatial-visual 

54 See Colom et al., 2000, and Dolan et al., 2006, for 
Spain; Jensen, 1998, Table 13.1, for American and 
United Kingdom data; and Van der Sluis, Derom, 
et al., 2008, for data from Belgium and the Nether¬ 
lands. Nyborg [2005J reports a substantial male 
advantage, but he used a small sample of people 
who agreed to participate in extended testing, and 
hence may not have satisfied the requirement of 
having samples that were equally representative of 
men and women, due to recruitment effects. 

55 Molenaar, Dolan, & Wicherts, 2009. 

56 See Humphreys and Lubinski, 1996, for some 
of these arguments. The appropriate subtests to 
include in a composite battery depend upon what 
the battery is meant to do. The ASVAB legitimately 
has different subtests than the WAIS. If a test bat¬ 
tery is to be used for personnel selection, substan¬ 
tial legal and ethical issues arise when changing a 
test battery results in differential changes in the fre¬ 
quency of prediction of success for men and women. 
It must be shown that these changes increase pre¬ 
dictive validity. 


ability tests in favor of a more extended eval¬ 
uation of verbal testing. 

If the test is intended to measure intelli¬ 
gence, in the abstract, questions about what 
to include in a test battery are unanswer¬ 
able without a theory of what intelligence is, 
defined independently of the tests. This illus¬ 
trates (once again] the intellectual poverty 
of de facto acceptance of the argument that 
“intelligence is what the intelligence test 
tests.” 

11.5.2. General Intelligence II: The 
Evidence from Tests Said to Be Markers 
for General Intelligence 

In theory, a way to study male-female dif¬ 
ferences while avoiding the problem of hav¬ 
ing to justify the composition of a test bat¬ 
tery would be to look at men’s and women’s 
scores on a pure measure of g, and compare 
the scores obtained in an accurate sample 
of a large population, such as the popula¬ 
tion of a country, where the possibility of 
differential recruitment of men and women 
into the population would not be at issue. 
In practice, it is difficult, if not impossible, 
to find such a study. No pure measure of 
g exists; the best we have are progressive 
matrix tests. The most widely used of these, 
the Raven tests, do measure g, but in most 
populations they contain a significant visual- 
spatial reasoning component. 57 

Lynn has put the problem succinctly: 

Few people will be persuaded that general 
intelligence can be so narrowly defined as 
to consist solely of fluid ability measured 
by the Progressive Matrices. General intel¬ 
ligence is generally regarded as consisting of 
a broader range of cognitive abilities which 
would include the verbal and spatial sec¬ 
ond order factors. 

- Lynn , 1999, p. 6 

This poses a problem when RPM scores 
are used to determine male-female differ¬ 
ences in g. As will be documented later, 
there is substantial evidence that there are 
moderate to large male-female differences in 

57 Johnson & Bouchard, 2005a, Table 2; Palmer et al., 
i 9 8 5 . 


THE DEMOGRAPHY OF INTELLIGENCE 


3 8 i 


some types of visual-spatial reasoning. Any 
difference in RPM scores between men and 
women will reflect a difference both in g 
and in visual-spatial reasoning. Even if g is 
the predominant contributor to the RPM 
score, and there are no male-female differ¬ 
ences in g, a moderate male-female differ¬ 
ence in visual-spatial reasoning would pro¬ 
duce a small difference in RPM scores. 

The second problem is that the desired 
conclusion is a statement about differences 
in g, or lack of them, between men and 
women in general. Such a statement can be 
justified only if it is based on studies where 
the participants can be thought of as reason¬ 
ably close to a probability sample of some 
defined large population. 

A review of studies conducted prior to 
1980 concluded that there were no male- 
female differences in RPM scores. 58 Lynn 
and Irwing properly criticized this review 
for having included a large number of con¬ 
venience samples that made no claim of 
being representative of any national pop¬ 
ulation. They then conducted two meta¬ 
analyses of their own, one based on the 
RPM and another, intended to represent col¬ 
lege students, that contained studies using 
both the RPM and the more difficult Raven 
Advanced Progressive Matrices (RAPM). 59 
They concluded that in adults men score 
higher than women by approximately .3 
deviation units, which they regarded as 
equivalent to 4.5 to 5 points on the IQ scale. 
The male advantage was somewhat smaller 
in the early teen years, which would be 
in accord with Lynn's suggestion that prior 
to adolescence girls are both physically and 
cognitively more mature than boys of the 
same age. 

Their conclusions must be taken with 
a grain of salt. Lynn and Irwing present 
their results as generalizations about male- 
female differences in RPM scores across 
countries. This means that the representa¬ 
tion of male and female examinees in a study 
must be reasonably representative of males 
and females in the relevant country. At a 

58 Court, 1983. 

59 Lynn & Irwing, 2004a, 2005. 


minimum, the male:female ratio in the sam¬ 
ple should be roughly equal to the likely 
male:female ratio in the population. Fail¬ 
ure to meet this requirement is evidence 
that some unknown recruitment effect may 
be distorting the results. An unfortunate 
number of the studies upon which Lynn 
and Irwing base their case do not meet this 
criterion. 

Here are a few examples. A study said to 
represent Israel was actually a study of chil¬ 
dren in a kibbutz, surely a nonrepresentative 
sample of modern Israel. A study involving 
200 people was offered as representative of 
India, a nation of over one billion. A sam¬ 
ple of Brazilians aged twenty to forty con¬ 
tained over 1,900 men and 740 women. This 
is a huge distortion of the male:female ratio 
for that age group. In another study a sam¬ 
ple of “American college students” was actu¬ 
ally taken from a single university, which 
stresses its preeminence in engineering and 
agriculture. 60 The male:female ratio in the 
study was approximately 9:2. According to 
records I obtained from the university's web 
site the enrollments of men and women 
were approximately equal during the time 
of the study. Such obvious deviations from 
representativeness make the application 
of meta-analytic techniques questionable. 
Additional criticisms have been made of the 
college student analysis, on other grounds. 61 

A somewhat different picture emerges 
from the rather scanty reports that have 
been made of standardization studies. In 
discussing the 1979 standardization of the 
RSPM, which analyzed data from a city that 
had a demographic profile resembling the 
national profile, rather than from a popula¬ 
tion sample, John Raven reported that there 
were no male-female differences in progres¬ 
sive matrix scores. 62 No mention is made of 
male-female differences in his discussion of 
British and American standardizations in the 
1990s, but it is not clear whether a test for 
such differences was ever done. A collection 
of papers has been published that includes 

60 Lynn & Irwing, 2004b. 

61 Blinkhorn, 2005. 

62 Raven,2000. 
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reports of several national standardizations 
of the Raven tests, and a number of smaller 
studies, mostly of children and adolescents. 
No male-female differences are reported. 63 

A major point in Lynn’s argument is that 
the difference in RPM scores shifts toward a 
male superiority from childhood to adoles¬ 
cence. Statistically, this would amount to an 
age x sex interaction. In seven of the eight 
studies of children and adolescents in which 
a comparison between the age nine to ten 
and age fifteen to sixteen scores could be 
made, there was a shift toward better male 
performance with increasing age. 64 

Two conclusions can be drawn from these 
frustratingly incomplete results. The first is 
that if there is any systematic difference 
between men's and women's scores on the 
Raven tests, it is a small one. Otherwise 
it would show up much more clearly in 
studies that do approximate national sam¬ 
ples. The second is that the difference, if it 
does exist, could be due to either the g or 
the visual-spatial latent traits that underlie 
performance on progressive matrix items. 
Carlson and I were right when we stated 
our tenth principle for interpreting studies 
of group differences: investigators should be 
more willing than they are to say “We don't 
know.” 


Suppose we were to select a randomly 
chosen man or woman and try to guess his 
or her score on an intelligence test. Our best 
guess is the expected value, represented by 
E[x m ] or E(x w ] ) depending on whether we 
are talking about a man or a woman. Assum¬ 
ing a normal distribution for intelligence, 
this is the mean for the appropriate distribu- 
tion; for men, £(%„) = where 11 (x m ) 

represents the mean of the distribution of 
male scores. A similar expression applies for 
women, using the w subscript instead of m. 
However, we can also expect to miss by 
some amount, and the size of the misses will 
increase with the variability of the scores. 
Let E(x m ~ be the expected miss 

for men. (The same argument would fol¬ 
low for women, with w substituted for m.) 
The expected miss will be zero, because 
overestimates will balance out underesti¬ 
mates. However, the square of the misses, 
[Xyn — /u(%m}] 2 , will always be non-negative 
(and will be positive unless all scores are 
equal to the mean score). The more widely 
the scores are distributed about the mean, 
the larger the value of the expected squared 
miss: £((%,„ — //(% m )) 2 ). This is the concep¬ 
tual meaning of variance. 

The relative variability in male and 
female distributions of scores is measured 
by the variance ratio , VR, 


11.3.3. The Variance Issue 

If there is such a small difference in gen¬ 
eral intelligence (if any) between men and 
women, why do we care? What is all the fuss 
about? 

Men’s scores on measures of general intel¬ 
ligence are more variable than women’s 
scores. This difference, combined with a 
small difference in mean scores, implies that 
there will be substantial differences between 
men and women at both extremes of the 
intelligence distribution, those of above¬ 
normal and below-normal intelligence. To 
explain this a brief excursion into the con¬ 
ceptual meaning of variance is needed. 


63 Raven & Raven, 2008. 

64 Lynn & Irwing, 2004a, Table 1. 


VR — ^fom)) 2 ) 

e((x w - n( Xw )yy 




VR will be one if men’s scores and women’s 
scores are equally variable, greater than 
one if men’s scores are more variable than 
women's, smaller if women’s are more vari¬ 
able than men's. 

In studies where the sample can be 
regarded as approximating a probability 
sample of the population, the male:female 
variance ratio has been found to be greater 
than one, both in overall scores (indicators 
of g) and in scores on subtests evaluating par¬ 
ticular aspects of intelligence, such as verbal 
comprehension or spatial-visual reasoning. 
Here are some typical observations: 
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Figure 11.9. The distribution of CAT mean scores in a British 
sample of over 300,000 eleven-year-old schoolchildren. The normal 
distribution is shown as a reference point. Note the excess of boys 
at the extreme values (stanines 1-3 and 8-9] and the excess of girls 
in the middle stanines [4-6). Data from Strand et al., 2006, 
Appendix B. 


1. The WAIS-III provides a general men¬ 
tal ability [GAI) index, defined by mean 
scores on the verbal and nonverbal rea¬ 
soning sub tests. The means for men and 
women, respectively, are 100.1 and 97.9, 
a difference of .147 standard deviation 
units. The standard deviations for men 
and women are 15.1 and 14.7, with a 
corresponding variance ratio of (15.1 / 
14.76) 2 = 1.06. 65 

2. In the NAEP 2004 data for seventeen- 
year-olds, the mean scores for math¬ 
ematics were men -308, women -305. 
The variance ratio was 1.23. For reading, 
the scores were men - 278, women - 293. 
The variance ratio was 1.33. 66 

3. In the 2001-03 period 320,000 eleven- 
to twelve-year-old children enrolled in 
United Kingdom schools took the Cog¬ 
nitive Assessment Test (CAT-III; see 
Chapter 2). The mean score across all 
subtests was 99.1 for boys and 99.9 for 
girls, an effect size of .05 in favor of girls. 
The variance ratio was 1.13. 67 

65 Lange et al., 2006, Table 2. 

66 Data downloaded from nces.ed.gov/ 
nationsreportcard/lttnde/viewresults.asp, Decem¬ 
ber 2008. The variance ratios were calculated from 
the standard errors, on the assumption that there 
was an equal number of men and women. 

67 Strand, Deary, & Smith, 2006. 


Figure 11.9 uses the British study to 
illustrate what this difference in variances 
implies for the overall distribution of scores. 
In this study there was a trivial difference 
in means between boys and girls, but a 
reasonably large variance ratio. The figure 
shows the distribution of boys’ and girls’ 
scores in stanines , 68 The standard normal 
curve (recalibrated to stanines) has been 
superimposed on this distribution. The girls’ 
scores are leptokurtic, that is, somewhat 
more bunched-up than would be expected 
if they were normally distributed. The boys’ 
scores are platykurtic, less bunched-up than 
would be expected. They are also slightly 
skewed to the left. As a result, there will be a 
higher percentage of boys than girls at both 
the high and low ends of the distribution, 
even when, as is the case in the UK data, 
there is virtually no difference in means. 


68 A stanine score is a linear transformation and quanti- 
tization of a standard score. Stanines are defined for 
intervals 1-9, with 5 representing the mean, equiv¬ 
alent to a standard score of zero, and a standard 
deviation of 2. For stanines 2-8 the interval covered 
by a stanine has the width of one-half a standard 
deviation (stanine range of 1). For instance, stanine 
5 covers the standard score interval —2.5 < z < .25. 
Stanine 6 covers the interval .25 < z < .75, and so 
forth. Stanine 9 covers the interval 1.75 < z < + 00, 
and stanine 1 covers the interval - 00 <z < —1.75. 












3 8 4 


HUMAN INTELLIGENCE 


Panel 11.4. An Illustration of the 
Effects of Small Differences in 
Means and Variances 

The differences in intelligence test scores 
between men and women in both means 
and variances are small, in the absolute 
sense. In combination they imply sub¬ 
stantial changes in the ratio of men to 
women at the extreme high and low 
ends of the distribution. This is impor¬ 
tant, because socioeconomic contribu¬ 
tions (loosely, “eminence”) may depend 
predominantly upon talent at the high 
end of the intelligence scale, while socio¬ 
economic costs are created dispropor¬ 
tionately by people at the low end of the 
distribution.* 

The effects can be seen by comparing 
three distributions: the standard normal 
distribution (z, mean = o, standard devi¬ 
ation = 1), which will be used to represent 
the female score distribution, F, and three 
possible male distributions. They are Mi 
(mean = .15 d units, standard deviation = 
1), M2 (mean = o, standard deviation = 


1.05), and M3 (mean — .15 d units, stan¬ 
dard deviation = 1.05). As a standard devi¬ 
ation of 1.05 corresponds to a variance 
ratio of mo, these values are in the range 
of empirically observed values for male- 
female mean differences and male:female 
variance ratios. 

Figure 11.10 (a) shows the three M dis¬ 
tributions, superimposed on the F dis¬ 
tribution. There appears to be little dif¬ 
ference between them. Figure 11.10 (b) 
presents the data from a different per¬ 
spective. It shows the ratio of males to 
females at different points on the z scale, 
comparing the three M distributions to 
the F distribution, and assuming that 
there is an equal number of males and 
females in the population. This distri¬ 
bution is approximately correct for ages 
twenty to forty. The male:female ratios 
are close to one in the center of the z 
scale (i.e., the range of “normal” intelli¬ 
gence) and strikingly larger than one in 
both the upper and lower ranges. 

* Gelade, 2009; Herrnstein & Murray, 1994; 

Murray, 1998, 2003. 


This example illustrates a general prin¬ 
ciple. Slight differences in the means and 
variances of a distribution have little effect 
on the distribution of most of the scores, but 
can have substantial effects on the upper and 
lower tails of the distribution. Panel 11.4 and 
Figure 11.10 illustrate how large the effect can 
be. The combination of a slight difference 
in means and variances has very little effect 
on the distribution of intelligence test scores 
across men and women in the “generally nor¬ 
mal” range, say from IQ equivalents of 80 to 
120, which is where 80% of all scores lie, but 
can produce substantial differences in the 
frequencies of men and women among the 
top and bottom 10%. 

The different frequencies of males and 
females at the extreme ends of the distri¬ 
butions have consequences for educational 
programs. Boys markedly outnumber girls 


in special education programs, and in other 
indices of low but not abnormal intelligence, 
by a ratio of about 2:1. 69 At the other end 
of the academic spectrum, unless there is 
an administrative decision to require equal 
numbers of boys and girls, boys usually out¬ 
number girls in programs for gifted students. 
This was true in the Study of Mathemati¬ 
cally Precocious Youth (SMPY), which was 
discussed in Chapter 10. Recall that part of 
this study involved three different cohorts, 
which can be described as being in the top 
1 in 100, 1 in 200, and 1 in 10,000 in the 

69 The ratio depends upon the criterion used for 
admission to the program. In the United States 
those states that offer fewer services, and hence have 
a stricter criterion for admission to the program, 
have higher M:F ratios (Coutinho & Oswald, 2005). 
Similar ratios have been reported for Europe (Skar- 
brevik, 2002). 




THE DEMOGRAPHY OF INTELLIGENCE 


385 



Intelligence in d units 



Figure n.10. The effect of small differences in means and variances 
upon the distribution of scores at various points on the d scale. 


distribution of SAT scores. The corre¬ 
sponding malerfemale ratios were 1.5:1, 2.1:1, 
and 11.2:1. 

While differences in variance may be part 
of the explanation for the excess of men over 
women in the extremes of the intelligence 
distribution, this cannot be the whole story. 
On the positive end of the scale, a d scale 
value of 4 (IQ = 160) corresponds to the 
“1 in 10,000” cohort in the SMPY. Accord¬ 
ing to Figure 11.10, the expected male:female 
ratio at this point should be 3.5, which is not 
even close to the 11:1 observed ratio. The 
observed male:female ratio in special educa¬ 
tion classes (roughly equivalent to d — —2.5, 
IQ = 78 and below] is also higher than would 
be expected on the basis of differences in 
variance alone. 

Wendy Johnson and her colleagues at the 
University of Edinburgh have suggested a 


reason for the overrepresentation of males 
at the low end of the distribution. 70 They 
assumed that the distribution of intelligence 
actually consists of two distributions: a dis¬ 
tribution of the intelligence of normally 
developing individuals, which is centered 
slightly above the IQ = 100 point, and a dis¬ 
tribution of individuals who have been sub¬ 
jected to either biological or environmental 
disturbances that disrupt normal develop¬ 
ment. This distribution, which is consider¬ 
ably smaller than the first, is centered on the 
IQ = 80 point. Assuming that both distri¬ 
butions have a standard deviation of 15 IQ 
points, about 75% of the individuals in the 
disrupted group would have IQs above 70, 
the usual criterion for the mentally dis¬ 
abled. Therefore, the disrupted-development 

70 Johnson, Carothers, & Deary, 2008, 2009. 
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population would consist largely of people 
whose intelligence was in the low normal, 
rather than the pathological, range. Johnson 
and colleagues further assumed that more 
males than females fall into the disrupted- 
development population. This is consistent 
with a great deal of data showing that males 
are generally more at risk for biological dis¬ 
ruption than females. 

Johnson and her colleagues also pointed 
out that greater male variability would be 
expected if the (yet unidentified) genes 
for general intelligence are located on 
the X chromosome, because the male 
genetic potential would then depend upon a 
smaller, and hence more variable, sample of 
the alleles for intelligence than would be the 
case for women, with two X chromosomes. 
The assumption is not unreasonable, for we 
know that genes leading to severe cognitive 
pathologies are overrepresented on the X 
chromosome. A direct test of the hypothe¬ 
sis will have to wait until the genes for nor¬ 
mal variations in the genetic potential are 
located. Johnson and colleagues' assump¬ 
tions are sufficient to account for deviations 
from the normal distribution in low scores 
from two Scottish surveys of intelligence 
in eleven-year-olds, taken in 1932 and 1947. 
Similar excesses of low scores have been 
observed in other data sets. 

This leaves us without a proposal to 
explain the excess of males at the upper 
ends of the intelligence distribution. We 
take up this issue subsequently, in discus¬ 
sions of educational issues, in section 11.3.6. 
First we look at male-female differences on 
dimensions of intelligence other thang. Here 
we find much clearer differences. 

11 . 3 . 4 . The Cognitive Differences 
between Men and Women 

Although there is at most a small difference 
between men and women in general intelli¬ 
gence, there are substantial differences along 
some of the dimensions of intelligence. In 
discussing these we will take a top down 
approach. We first look at the results from 
national surveys. These provide a broad¬ 
brush view of the nature of sex differences in 


cognition, but do not provide details because 
the economics and logistics of very large test¬ 
ing programs rule out close examination of 
any one cognitive trait. We then look at psy¬ 
chometric research studies, which provide a 
more detailed look at individual traits, at the 
expense of not using nationally representa¬ 
tive samples, but keeping the constraints of 
the conventional testing paradigm. Finally 
we look at laboratory studies using the tech¬ 
niques of cognitive psychology. These stud¬ 
ies provide a much finer look at individual 
behavior, because they relax the constraints 
of the testing paradigm, at the cost of 
not studying correlations between traits, 
and using quite unrepresentative samples. 
These approaches complement each other, 
in much the same way that the progres¬ 
sion from public health surveys to laboratory 
research provides complementary sources of 
information in the biomedical sciences. 

We begin with a study of the way in 
which scores are distributed in the WAIS- 
III standardization sample. Most psychome¬ 
tric models assume that the scores are dis¬ 
tributed in accordance with a multivariate 
normal distribution, but that is not nec¬ 
essarily the case. The profiles defined by 
the WAIS subtests could fall into patterns 
of similar scores, rather than being dis¬ 
tributed smoothly across the mathematical 
space defined by possible test scores. For a 
geographic analogy, people's residences in 
the US are not characterized by a smooth 
distribution across the country; homes are 
clustered into cities and towns. 

A technique known as cluster analysis has 
been used to identify five patterns in the 
WAIS standardization. 71 The clusters dif¬ 
fer on two variables: overall performance 
level and whether people within the clus¬ 
ter obtain better scores on perceptual speed 
measures than would be expected given 
their overall level of performance. Women 
do not differ from men in overall perfor¬ 
mance, but they are overrepresented in clus¬ 
ters associated with high perceptual speed, 
which in effect means rapid recognition of 
simple visual figures. 

71 Donders, Zhu, & Tulsky, 2001. 
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Table 11.3. Male-female standard deviation unit {d) scores for effect size for different 
aspects of intelligence. The NLSY79 survey reported two scores for mathematics - 
arithmetical reasoning (use of numerical reasoning in problem solving) and mathematical 
knowledge. The same survey also reported two scores for the speed of conducting simple 
cognitive operations. One was a simple decoding operation; the other evaluated the 
examinee's speed of executing elementary numerical operations. 


Survey 

Code 

Test 

Date 

N 

Reading 

Comp. 

Mathematics 

Nonverbal 

Reasoning 

Spatial 

Ability 

Perceptual 

Speed 

Assoc. 

Memory 

Project 

i960 

73 ; 4 2 5 

~- l 5 

.12 

•°4 

•*3 

n.a. 

“■ 3 2 

Talent 

NLS-72 

1 9 J2 

16,860 

-.05 

.24 

—.22 

n.a. 

-• 2 3 

—.26 

HS&B 

1980 

25,069 

.002 

.22 

n.a. 

• 2 5 

— .21 

-.18 

NLSY79 

1980 


-.18 

.26, .08 

n.a 

n.a. 

-• 43 ; —* 2 3 

n.a. 

NELS:88 

1992 

2 4;599 

-.09 

■°3 

n.a. 

n.a. 

n.a. 

n.a. 


Source: Data is from Hedges and Nowell, 1995, Tables 1 and 2. 


A similar pattern has been found in 
studies of nationally representative ado¬ 
lescent populations. Table 11.3 shows the 
results from four national surveys of peo¬ 
ple who were tested when they were in 
high school, and who have since been fol¬ 
lowed through their early adult careers. 72 
While there is some discrepancy between 
the results, which is probably due to dif¬ 
ferences in content between different tests, 
two trends stand out. Women do better 
than men in tests of reading comprehen¬ 
sion, speed of simple perceptual operations, 
and tests of associative memory, in which 
examinees have to recall arbitrary associa¬ 
tions, such as associating a picture and a 
number. Men do better than women on tests 
of visual-spatial reasoning and mathematics. 

The results from the WAIS III indicate 
that in the adult years the female advan¬ 
tage in verbal operations disappears, but 
the advantage in tasks involving rapid, sim¬ 
ple perceptual recognition and execution 
of simple cognitive operations (processing 
speed) remains. 

These conclusions refer to very broadly 
defined abilities. Two psychometric research 
studies provide further detail, and, in the 
case of the second study, establish a theo¬ 
retical framework for thinking about these 
results. 


The Differential Aptitude Battery (DAT) 
is a battery of tests developed by the 
Educational Testing Service for research 
purposes. Figure 11.11 presents male-female 
differences on some of the subtests, calcu¬ 
lated for four standardizations in the United 
States, spanning just over forty-three years. 
Results quite similar to those shown in Fig¬ 
ure 11.11 have been obtained in Spain and in 
the United Kingdom. 75 

The consistency of the results is impres¬ 
sive. Women do somewhat better on tests 
of language skills (as opposed to reason¬ 
ing about verbally presented material) and 
on tests of speed and accuracy of simple 
operations. Men do markedly better on tests 
involving the manipulation of visual images. 
Men also do slightly better on tests of verbal 
and abstract reasoning. 

Johnson and Bouchard have placed 
results similar to these in the context 
of the g-VPR model. 74 In previous work 
(reviewed in Chapter 4, section 5) John¬ 
son and Bouchard had shown that the g- 
VPR model provides a good fit to the MIS- 
TRA data (Chapter 8). They then removed 
the variance in scores associated with the g 
dimension, and analyzed the residual scores 
on the various tests. The residuals can be 
thought of as being the variation in test 


72 Hedges & Nowell, 1995. 


73 Colom & Lynn, 2004; Strand, Deary, & Smith, 2006. 

74 Johnson & Bouchard, 2007a,b. 
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Figure 11.11. Male-female differences in d unit for selected subtests 
of the Differential Aptitude Battery, shown over standardizations 
from 1947 to 1990. 


scores that cannot be ascribed to variations in 
general intelligence. The residual variation 
could be characterized by three orthogonal 
dimensions. The smallest of these, in terms 
of variance accounted for, was a “memory 
for content of passages” factor. This will not 
be discussed further. The other two factors 
are more interesting. 

Figure 11.12 is a graphic model of Johnson 
and Bouchard’s results, showing the two fac¬ 
tors and their relation to the general intelli¬ 
gence (g) factor. The two residual factors 
are bipolar factors, in the sense that tests 
tend to have either high or low loadings on 
each of them. Johnson and Bouchard refer 
to these two dimensions as verbal-rotational 
and focused-diffuse . 

The bipolar factors need special interpre¬ 
tation, for they are not quite the same as 
the ability factors identified in the original g- 
VPR model. Vocabulary tests and rotational 
tests were good markers for the respective 
ends of the verbal-rotational factor. What 
the result says is that in the MISTRA sam¬ 
ple (adults!) after general intelligence had been 
accounted for people who knew lots of words 
tended to be poor at manipulating mental 
images, and vice versa. The focused-diffuse 
dimension contrasted people who did well 
on tasks that require concentration on visual 
diagrams with people who did well on tasks 
that involve comprehension of verbal argu¬ 


ments and possession of information about 
the world. 

Johnson and Bouchard measured male- 
female differences along each of the dimen¬ 
sions. They found only small, nonreliable 
differences on the g dimension. There were 
large differences [d values larger than .5) on 
the subsidiary dimensions. Women tended 
strongly toward the verbal ends of both 
bipolar dimensions, and tended to have 
superior memory for information presented 
during the testing session. Men tended to 
have higher scores on the “focused” (on 
visual objects) and “rotational” ends of the 
bipolar dimensions. Once again, though, it 
is important to remember that these differ¬ 
ences refer to performance after removing 
variation due to general intelligence. 

Johnson and Bouchard have offered an 
interesting summary and interpretation of 
their results. 75 They argue that whenever 
a person solves a problem he or she does 
so by combining general reasoning ability 
(i.e., g ) with the particular mental tools they 
have on hand. For instance, many osten¬ 
sibly visual-spatial problems can also be 
solved by verbal reasoning. The results just 
cited indicate that men and women dif¬ 
fer somewhat in the quality of their men¬ 
tal tools; the verbal tools tend to be better 

75 Johnson and Bouchard, 2007b. 
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General intelligence (+) 



Rotation: Tasks that 
involve manipulation of 
mental images. 


Tasks that require integrating 
information from multiple 
sources or analyses, e.g., 
paragraph comprehension, 
identifying similarities between 
objects. 


Figure 11.12. Johnson and Bouchard's model of general intelligence 
plus two bipolar factors (verbal-rotational and focus-diffuse). The 
bipolar factors represent a trade-off between opposing strengths. 


for women, and the mental imaging and 
attention-focusing tools tend to be better for 
men. Rational application of general intel¬ 
ligence would thus lead men and women 
to adopt somewhat different strategies for 
problem solving. Therefore, even though 
men and women are essentially equal in gen¬ 
eral intelligence, they may differ in their 
performance on particular tests, because 
performance depends upon both general 
intelligence (for which there is no sex dif¬ 
ference) and the residual abilities, where sex 
differences may be substantial. 



Figure 11.13. Illustrations of two visual-spatial 
reasoning tasks on which men tend to 
outperform women. See text for details. 


Both the national surveys and the psy¬ 
chometric research studies indicate that the 
big differences between men and women are 
on perceptual and spatial-visual reasoning 
tasks - the P and R dimensions of the g- 
VPR model. (One could say much the same 
thing about the PS and Vz dimensions of 
the three-stratum model.) Laboratory stud¬ 
ies amplify upon these results. Men tend 
to be better than women at tasks involv¬ 
ing the manipulation of mental images. The 
prototypical example is a mental rotation 
task, in which two figures must be compared 
by moving them about “in the mind's eye.” 
Women take longer to do this, on the aver¬ 
age, and make more errors. The second type 
of task on which men's performance is gen¬ 
erally better than women’s performance is a 
field-independence task, where an observer 
is required to orient an object to the true ver¬ 
tical or horizontal, overcoming the effects of 
framing figures. Examples of these two types 
of task are shown in Figure 11.13. Men also do 
better than women on tasks involving judg¬ 
ment of real or imagined motion. 76 

76 Hunt et al., 1988; Kimura, 1999, Chapter 5; Law, 
Pellegrino, & Hunt, 1993. 
















390 


HUMAN INTELLIGENCE 



Figure 11.14. Detecting details in a complex 
picture. Count the triangles in the figure above. 
This task requires analysis of a static visual 
figure. Women tend to outperform men on this 
sort of task. 

By contrast, there are many spatial-visual 
tasks that do not involve any of these skills. 
For instance, women tend to outperform 
men on tasks that require analysis of static 
visual figures. An example, where the task 
is to find a component within a picture, is 
shown in Figure 11.14. 

A possibly related finding is that males do 
better than females in way finding, the abil¬ 
ity to find locations and to maintain aware¬ 
ness of positions in the environment. This 
is true of both children and young adults, 
and appears to be related to performance 
on spatial orientation tasks, although the 
relationship is not a strong one. This result 
holds for imagined routes through environ¬ 
ments, actual orienteering, and acquisition 
of knowledge of the environment through 
interaction with computer-generated virtual 
environments. 77 

The skills evaluated by these tasks have 
applications in everyday life. There are very 
large individual differences in our ability to 
find our way about the world. And, as would 
be expected from Johnson and Bouchard’s 
analysis, way finding can be accomplished 
by visualizing the surrounding environment 
or by using strategies that rely on verbal 
memory. Men are more likely to use the first 
strategy, women the second. 78 There are also 
a number of practical situations in which 

77 Choi & Silverman, 2003; Malinowski, 2001; Waller, 

Knapp, & Hunt, 2001. 

78 See Hunt, 2002, for a review of the evidence. 


a person must visualize motion or changes 
in perspective while viewing a static dis¬ 
play. These vary from analyzing gear trains 
to assembling objects from diagrammatic 
instructions - a task that will be familiar and 
perhaps frustrating to anyone who has pur¬ 
chased to-be-assembled furniture kits. 

Such observations do not mean that 
“women can’t (find their way in the woods, 
read maps, be architects, fly helicopters . ..) 
as well as men.” The following qualifications 
should be kept in mind. 

Virtually all tasks we encounter in daily 
life admit to multiple solutions. General 
intelligence is a far better predictor of 
performance than rotational ability, even 
though both are important. People can 
apply general intelligence to develop a 
problem-solving strategy that suits their par¬ 
ticular strengths. 

The claims for male superiority in rota¬ 
tion and related visual-spatial abilities refer 
to statistical trends. Assuming a deviation 
difference on a rotation task of .5, we would 
still expect 30% of the women to outper¬ 
form 50% of the men. The differences in the 
frequencies of men and women, in favor of 
men, would be more extreme at higher lev¬ 
els of performance. 

Visual-spatial abilities, like virtually all 
abilities, can be acquired through training. 
This is particularly true if the training can 
be focused on a narrow set of skills required 
for a particular task. On the average, women 
do not read maps as well as men. 79 Map 
reading is taught all the time, to both men 
and women, in contexts varying from sports 
orienteering to military training. The visual 
rotation task (Figure 11.13) was described as 
the prototypical task for illustrating male- 
female differences in rotation. Similarly, it 
can be used to illustrate age differences in 
rotation, which can be substantial over the 
adult years. People can be trained to per¬ 
form this task. In one study in my own lab¬ 
oratory women in their fifties were trained 
to the level of performance typical of male 
undergraduates. 80 These are only two of 

79 Boardman, 1990. 

80 Berg, Hertzog, & Hunt, 1982. 
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several examples that could be given of the 
acquisition of visual-spatial skills through 
training. 81 

Individual differences are generally not 
eliminated by training, although they may 
be reduced. This is what happened in the 
study in my laboratory; both men and 
women got better after five days of train¬ 
ing, but there were still differences between 
them in their ability to do the spatial rota¬ 
tion task. Cognitive traits are not invariant, 
in the sense that height is. They are more 
like weight: they can be modified, but it can 
take a lot of work to do so. The amount of 
training required to reach a fixed level of 
performance will probably depend upon a 
person's initial performance level. 

11 . 3 . 5 . Male-Female Differences in 
Cognitive Traits Relevant to Education 

This section discusses cognitive differences 
between men and women that have a direct 
impact upon education. The issue is impor¬ 
tant, because different educational avenues 
lead to markedly different careers in adult¬ 
hood. 

Psychologists and educators think about 
individual differences in different ways. Psy¬ 
chologists are interested in tasks that maxi¬ 
mize individual differences in human behav¬ 
ior, try to characterize these differences, 
and, especially in recent years, try to relate 
behaviorally defined traits to biological sys¬ 
tems. To illustrate, one of Johnson and 
Bouchard’s arguments for preferring the g- 
VPR model of intelligence to the Gc-Gf 
model is that the behavioral distinctions 
in the g-VPR model map onto distinctions 
between brain systems, while those in the 
Gc-Gf model do not. 

By contrast, educators think in terms of 
subject matter. They want to talk about cog¬ 
nitive traits associated with educational con¬ 
tent. The biggest division of the curricu¬ 
lum is into topics that are broadly associated 
with language arts and topics associated with 
mathematics. Accordingly, to an educator 

81 See Newcombe, 2007, for a further discussion of this 

issue. 


the most interesting individual differences 
are differences in the ability to deal with 
language and mathematics. "General reason¬ 
ing" is too amorphous a concept, and, to an 
educator, perceptual and visual-spatial skills 
seem too microscopic. 

We need to coordinate these two views, 
paying special attention to the ways in which 
male-female differences in cognitive experi¬ 
ences impact on education. A large study 
from Germany provides a good place to 
begin. 82 

Martin Brunner and two colleagues at 
the Max Planck Institute conducted a psy¬ 
chometric analysis of the data from a 
German testing program involving over 
29,000 students, randomly selected from the 
seventeen-year-olds in the German school 
system. The study was exceptional both for 
the representativeness of the sample and 
for the care that the investigators paid to 
the technical issues concerning group differ¬ 
ences that were described in the introduc¬ 
tion to this chapter. They concluded that 
the data was best fit by a hierarchical model, 
consisting of a general factor (g) and a nested 
factor model, in which mathematics or ver¬ 
bal (reading] abilities contribute to over¬ 
all mathematics or reading test scores. The 
traits of interest to educators, language and 
mathematical abilities, appear as specializa¬ 
tions of a general reasoning factor, which is 
not a targeted educational variable. 

Brunner and his colleagues then exam¬ 
ined male-female differences along each 
dimension. The seventeen-year-old girls 
slightly outperformed the boys on the gen¬ 
eral reasoning and reading factors [d — -.09 
for both comparisons), but the boys 
markedly outperformed the girls on the 
mathematical factor (d = .94). This con¬ 
forms to the popular belief that boys are 
better than girls at mathematics, although, 
as we will show, this finding ought to be 
qualified. 

Brunner and his colleagues then con¬ 
sidered what their findings might imply 
for mathematics. They argued that math¬ 
ematical problems are attacked with a 

82 Brunner, Krauss, & Kunter, 2008. 
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Figure 11.15. Male-female differences on various educational tests 
of reading and mathematics. All studies utilized large probability 
samples of the relevant US population. Positive numbers indicate a 
male advantage; negative numbers indicate a female advantage. 
Data from Hedges and Nowell, 1995, Table 2. 


combination of general reasoning skills, 
in which men and women are essentially 
equivalent, and mathematics-specific skills, 
in which men, on the average, exceed 
women. As a result, men should do better 
than women in those areas of the curriculum 
that emphasize mathematics, providing that 
the mathematics involved is sufficiently spe¬ 
cialized to emphasize mathematical rather 
than general reasoning skills. 

Applying Brunner and colleagues’ rea¬ 
soning to the typical educational progres¬ 
sion, what we should see is a progressive 
sharpening of differences between men and 
women in educational accomplishment as 
we move from the general education cur¬ 
riculum through the undergraduate univer¬ 
sity years, and then on to specialized educa¬ 
tion and career achievements in the Science, 
Technology, Engineering, and Mathematics 
(STEM) fields. That is what happens, but 
there are some important qualifications. 

WHAT MEN AND WOMEN GET OUT 
OF THE K-12 SYSTEM 

In the United States public education is 
available to everyone from kindergarten 
through the twelfth grade. While there is 
some variation from state to state, atten¬ 
dance is usually compulsory through age six¬ 


teen, and students are very strongly encour¬ 
aged to complete the entire course, graduat¬ 
ing at seventeen or eighteen. Similar public 
education programs exist in other developed 
countries. 

Figure 11.15 presents a comparison of the 
scores achieved by high school age boys 
and girls on the cognitive tests used in sev¬ 
eral Department of Labor surveys during 
the last half of the twentieth century. 8 3 
Throughout this period girls consistently 
outscored boys on tests involving language 
use, while boys outscored girls by a some¬ 
what larger margin on tests of mathemati¬ 
cal skills. Similar results have been obtained 
by the National Assessment of Educational 
Progress (NAEP). This test has been called 
“the nation’s report card” for the evaluation 
of language, mathematics, and science skills. 
Somewhere between 70,000 and 100,000 stu¬ 
dents are evaluated each year. In the twelfth 
grade girls outscore boys in reading, while 
boys outscore girls in mathematics. 84 

International results are similar. The Pro¬ 
gram for International Student Assessment 
(PISA) is a program in which representative 

83 Hedges 81 Nowell, 1995. 

84 "Nation’s Report Card,” National Center for Edu¬ 
cation Statistics. 
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Difference between male and female scores in d units. 

Figure 11.16. The difference between male and female scores on 
three of the 2003 PISA examinations. Differences are expressed in 
deviation units using a nominal standard deviation of 100, which is 
the intended standard deviation for the PISA examinations. 


schools are selected, and fifteen-year-old 
children are evaluated on tests involving 
reading, mathematics, science, and prob¬ 
lem solving. The problem-solving section 
presents realistic problems that do not 
depend upon specific academic knowledge, 
such as finding an efficient route on a bus 
line. Figure 11.16 shows the male-female 
deviation scores on the PISA reading, math¬ 
ematics, and problem-solving tests, calcu¬ 
lated for the United States and six other 
economically developed countries. The 
consistency in the pattern is striking. Across 
all countries males exceed females by a 
small amount (d’s consistent, but less than 
.1) in mathematics, while females exceed 
males by a larger amount (d’s around .3) in 
reading. On the problem-solving test, where 
an attempt was made to set problems that 
did not depend upon reading or mathe¬ 
matics skills, there was virtually no gender 
difference. 

Figure 11.16 emphasizes the consistency 
of differences between boys and girls, but 
does not indicate the fact that the abso¬ 
lute level of achievement varies a great deal 
across countries. This variation is shown in 
Figure 11.17. The same pattern is evident in 
the science scores - small but consistent 
male-female differences and much larger 
differences between countries. Fifteen-year- 
old girls in Australia, Canada, Germany, 
Japan, and Sweden have higher mathemat¬ 


ics and science scores than fifteen-year-old 
boys in Spain and the US. 

One more point in the international data 
is worth noting. Because males typically dis¬ 
play greater variance in scores than females, 
we would expect the differences between 
fifteen-year-old boys and girls to be greater 
at high levels of performance than it is on 
the average. This is the case. Averaged over 
countries, the d for male-female median 
mathematics scores is .10. At the ninety-fifth 
percentile it is .20. 85 

The data for K-12 educational achieve¬ 
ment presents a remarkably consistent pic¬ 
ture. At the end of the standard (and often 
compulsory) school period mid-teenage girls 
have greater language related skills than boys 
of their age, while the opposite is true of 
skills and knowledge in science and math¬ 
ematics. In terms of deviation units the 
female advantage in language is greater than 
the male advantage in science and mathe¬ 
matics. It is important to remember that this 
is a statement about the extent to which the 
male and female score distributions differ 
with respect to each other. It is not a state¬ 
ment of differences in absolute skill, because 
there is no metric by which we can compare 
a difference in mathematical knowledge to 
a difference in language skill. 

85 Machin & Pekkarinen, 2008. 
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Figure 11.17. Mean scores of fifteen-year-old students in selected 
countries on the PISA 2003 tests of mathematics, science, and 
general problem solving. Data from the National Center for 
Educational Statistics, Digest of Educational Statistics, 2007, 

Table 389. 


COLLEGE AND UNIVERSITY 
UNDERGRADUATE EDUCATION 
College and university students represent 
an important population, because this pop¬ 
ulation of young adults contains most of 
the people who will be leaders of soci¬ 
ety, both in the dramatic sense of provid¬ 
ing a few highly visible leaders, and in the 
perhaps more important sense of providing 
the many business managers, entrepreneurs, 
technicians, and professionals who will con¬ 
stitute the economically and socially most 
productive segments of society. 86 As is well 
known, in the last fifty years there has been 
a tremendous expansion of social and eco- 

86 Gelade, 2009. 


nomic opportunity for women within the 
college-educated segments of society. There 
is considerable reason to be interested in dif¬ 
ferences between the cognitive skills of men 
and women within this group. 

It is important to remember, though, that 
results obtained by studies of people who are 
either in or about to enter undergraduate 
education do not necessarily generalize to 
the population at large. Since the early 1980s 
more women have enrolled in college than 
men. As of 2008, there were approximately 
three undergraduate men for every four 
women. This situation arises because at a 
given level of high school academic achieve¬ 
ment a woman is more likely to take the 
first steps toward undergraduate education 
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Figure 11.18. Mean SAT mathematics scores for men and women, 
from 1988 to 2008. The d values for these comparisons range from 
.33 to .41. Source: College Board Report, College Bound Seniors, 2008. 


than a man is. 87 Accordingly, female under¬ 
graduates are a somewhat academically less 
select subpopulation of all women than male 
undergraduates are of all men. While this is 
not quite accurate, you can think of the two 
populations as being roughly the top 45% of 
men and the top 55% of women. 

One implication of this is that male- 
female differences will be algebraically 
larger in the undergraduate population than 
in the high school population. The effect 
will be further exacerbated by the vari¬ 
ance effects described earlier. Men’s scores 
will be increased, overall, because of a high 
male:female ratio in the population of peo¬ 
ple who have high test scores. The excess 
of men with low test scores will not affect 
the undergraduate means, because people 
with low test scores are unlikely to enroll in 
undergraduate programs. 

These trends are seen in the test scores of 
entering students. In the high school popu¬ 
lation, as a whole, the women’s mean read¬ 
ing score is above the men’s mean by about 
.2 standard deviation units, and men exceed 
women in mathematics test scores by about 
.1 standard deviation units. The 2008 SAT 
reading scores were 504 for men and 500 
for women, while the mathematics scores 
were 533 for men and 504 for women. Using 
the nominal 100-point standard deviation for 
SAT sections, this translates into a trivial .04 

87 Hunt & Madhyastha, 2008. 


d male advantage in reading, and a nontriv¬ 
ial .33 d advantage in the mathematics (now 
renamed "reasoning”) portion of the test. 
This is the sort of magnification of male- 
female differences that would be expected 
because of the differential recruitment of 
men and women from the high school to 
the undergraduate population. 

The male-female discrepancy in SAT 
mathematics scores is not a recent phe¬ 
nomenon. Figure 11.18 shows the SAT math¬ 
ematics (SAT-M) scores of college-bound 
seniors from 1988 until 2008. Although the 
levels of scores for both men and women 
vary from year to year, the difference 
between means is remarkably consistent. 

In 2005 an approximately equal num¬ 
ber of men and women earned bachelor’s 
degrees in Science, Technology, Engineer¬ 
ing, and Mathematics (in the jargon of the 
US National Science Foundation, the STEM 
fields). This was a change from thirty years 
earlier, when the ratio of men to women 
earning bachelor's degrees in these fields was 
approximately 2:1. 88 However, because the 
undergraduate male:female ratio is 3:4, the 
1:1 ratio in the STEM fields implies that a 
man is about 4/3 more likely to major in a 
STEM field than a women is. The extent to 
which this disparity is due to cognitive com¬ 
petence, personal interests, or social pres¬ 
sures is impossible to say. 

88 Source: National Science Foundation. 
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Figure 11.19. The relation between SAT-M scores, grades (A, B ; C, 
D, E), and gender in three levels of college courses. Within course, 
men have higher SAT-M scores than women who have comparable 
grades. Source: Wainer & Steinberg, 1992. 


Women are not entering the STEM fields 
and then dropping out because they find 
the work too difficult. Given constant SAT- 
M scores, women consistently outperform 
men in mathematics courses. Hq The differ¬ 
ence can be striking. Figure 11.19 shows the 
results from a very large (N ~ 49,000) study 
of college students in a variety of Amer¬ 
ican universities. 90 SAT-M scores increase 
as a function of the level of mathematics 
involved. This reflects the unsurprising fact: 
people with high SAT-M scores are more 
willing to enroll in mathematics courses than 
people with low scores. Within each type 
of course, SAT-M scores increase as the 
course grade increases. This shows that the 
SAT-M is a valid predictor of accomplish¬ 
ments in mathematics classes. Both of these 
trends hold for men and for women. 

But then we come to a paradox. Within 
each course and grade level, women receive 
higher grades than men, even though they 
have lower SAT-M scores. In terms of edu¬ 
cational outcomes the difference can be 
substantial. Women who receive Bs have 
SAT-M scores lower than the men who 
receive Cs and Ds in the same class. This is 
a striking example of a general tendency for 

89 Clark & Grandy, 1984; Lynn & Mau, 2001; Ramist, 
Lewis, & McCamley-Jenkins, 1994; Wainer & Stein¬ 
berg, 1992. 

90 Wainer & Steinberg, 1992. 


SAT scores to underpredict women’s educa¬ 
tional achievements in the early undergrad¬ 
uate years. 

I have heard three explanations offered to 
explain the paradox. One is that the test is 
an objective measure of mathematics abil¬ 
ity, while grades are a subjective measure 
based on the instructors’ decisions. There¬ 
fore, men “really” possess more mathemat¬ 
ics ability, but women are able to present a 
more favorable impression to instructors. 

This explanation strikes me as a plausible 
argument for low correlations between test 
scores and grades in courses where there is 
a substantial subjective component to grad¬ 
ing, such as a course in English literature, 
but I do not see how the argument applies to 
lower-division college mathematics courses, 
where right answers are clearly defined. 

A second argument is that women simply 
work harder to get grades. 

The most probable explanation is that 
women's stronger work motivation com¬ 
pensates for their lower test scores. 

- Lynn &£ Irwing, 2004a, p. 495 

Presumably what Lynn and Irwing meant 
was “lower ability as indexed by test scores” 
(which in this case referred to progressive 
matrix tests rather than the SAT), rather 
than to the test scores themselves, as the 
test scores per se are not involved in grade 
assignment. The motivational argument is 
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not unreasonable, for women do report 
doing more homework than men. 91 How¬ 
ever, this implies a false dichotomy between 
intelligence and motivation as evidenced by 
behavior as a student. Talents such as good 
time management and establishing priori¬ 
ties among goals are part of intelligence, in 
the conceptual sense, as much as the talents 
required for taking tests. 

A third possibility is that tests of math¬ 
ematical aptitude, such as the SAT-M, are 
influenced by a psychological trait on which 
there are sex differences, but this trait either 
does not contribute to mathematical perfor¬ 
mance outside the test, or does so, but has 
more influence on test performance than it 
does on in-class performance. Visual-spatial 
ability, the R dimension of theg-VPR model, 
has been suggested as a possibility, both for 
the SAT and for progressive matrix tests. 92 
Alternatively, there might be some ability 
that is not evaluated by the test, but that is 
important in the study of mathematics and 
is possessed by women more than by men. 
My comment about time management is an 
example of such an explanation. 

When all is said and done, though, we just 
do not know what the link is between male- 
female differences in test performance and 
in performance in mathematics. The prob¬ 
lem becomes more acute as we look at dis¬ 
crepancies between men and women at a 
higher level of analysis, the pursuit of careers 
in the STEM fields. 

POSTGRADUATE EDUCATION 
AND CAREER DEVELOPMENT 
It is difficult to say anything succinct 
about male-female differences in postgrad¬ 
uate education and career development in 
general, because any statement has to be 
qualified by considering the field involved. 
Postgraduate education is itself so varied 
that general statements about how men and 
women progress through curricula as differ¬ 
ent as mathematics, education, medicine, 
and the law are equally suspect. What we 
can do, however, is to look at some of the 

91 Mau & Lynn, 2000, 2001. 

92 Casey et al., 1995; Lynn & Irwing, 2004a. 


highly publicized differences in outcomes 
between men and women. Socially, what 
has been of particular concern is the scarcity 
of women in the Science, Technology, Engi¬ 
neering, and Mathematics (STEM] fields. 

The disparity is longstanding. Charles 
Murray's statistical survey of five thousand 
years of human accomplishment uncov¬ 
ered very few eminent woman scientists or 
mathematicians. 93 This is hardly surprising, 
owing to restrictions on women’s activities 
that were enforced by various human soci¬ 
eties until modern times, and even now out¬ 
side the industrially developed countries. 
Within these countries lifting of the restric¬ 
tions is quite recent. In 1925 Cecelia Payne- 
Gaposchkin (1900-1979) became the first 
woman to receive a doctorate (in Astron¬ 
omy, a STEM field) from Harvard Univer¬ 
sity. A few years later, in 1934, Grace Hopper 
(1906-1992) became the first woman STEM 
graduate (in Mathematics) at Yale Univer¬ 
sity. (Hopper later developed COBOL, one 
of the early computer programming lan¬ 
guages, and was the first person to use the 
term “bug” to describe an error in pro¬ 
gram execution.) In 2007 Drew Gilpin Faust 
became the first woman president of Har¬ 
vard. Hanna Holburn Gray served as acting 
president of Yale for one year in 1977. Things 
have changed, but perhaps not at breakneck 
speed. Harvard was founded in 1636, Yale 
in 1701. 

The national picture is similar to that at 
Harvard and Yale. Figure 11.20 shows the 
percentages of women receiving doctorates 
in several relevant fields, contrasting 1977 
and 2007. There has clearly been a great 
increase in the number of women receiv¬ 
ing advanced degrees. However, the pat¬ 
tern for type of degree has changed very 
little over a thirty-year period marked by 
major advances in opportunities for women. 
Today women greatly outnumber the men 
who receive doctorates in Education, and 
are greatly outnumbered by them in Math¬ 
ematics, the mathematically oriented phys¬ 
ical sciences, and Engineering. 


93 Murray, 2003. 
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■ % women 2007 
% women 1977 


Figure 11.20. The percentages of women among the recipients of 
doctoral degrees, 1977 and 2007. Source: National Science 
Foundation Survey of Earned Doctorates, 2007. 


Why have the higher reaches of the 
STEM fields remained so weighted toward 
men? In a widely publicized speech Faust's 
immediate predecessor at Harvard, the 
economist Larry Summers, proposed three 
reasons for the discrepancy. Two were 
social: the existence of conscious or uncon¬ 
scious prejudice on the part of hiring and 
promotion committees, and a preference 
for a personal lifestyle that is not compati¬ 
ble with the workaholic standards Summers 
associated with very high productivity in the 
STEM fields. The third was biological. Sum¬ 
mers correctly observed that there is a strik¬ 
ing disparity between the numbers of men 
and women who have very high scores on 
tests of mathematical ability and achieve¬ 
ment. He then speculated that women's 
brains are organized in a way that makes 
acquiring the high level of analytic skills 
needed in STEM research more difficult for 
women than for men. 94 

Summers’s remarks created such a 
firestorm that he subsequently resigned 
from the Harvard presidency. The incident 
is described in Panel 11.5. Going into all the 
possible social and biological reasons why 
women might be underrepresented in high- 
profile STEM positions is well beyond the 

94 See the earlier discussion of disparities between men 
and women in the SMPY program. For years men 
have outscored women on both the mathematical 
and verbal sections of the Graduate Record Exami¬ 
nation (Grandy, 1999). 


scope of this book. The controversy does 
provide a good entry point for a more gen¬ 
eral discussion of why there might be dif¬ 
ferences in intelligence between men and 
women. 

11 . 3 . 6 . The Causes of Cognitive 
Differences betiveen Men and Women 

Professionals in the field were far more 
nuanced in their reaction to Summers’s 
remarks than political figures and aca¬ 
demic leaders. The American Psychological 
Association (APA) and the Association for 
Psychological Science (APS), the two pro¬ 
fessional organizations most involved, com¬ 
missioned reports by people well known for 
their research in the field. The APA reports 
resulted in a book of contributed chapters, 
several of which I have cited. 95 The other 
report took up an entire issue in one of the 
APS's journals. Here are two sentences from 
its concluding paragraph. 

There cannot be any single or simple 
answer to the many complex questions 
about sex differences in math and science. 

Early experience, biological constraints, 
educational policy and cultural context 
each have effects , and these effects add and 
interact in complex and sometimes unpre¬ 
dictable ways. 

- Halpem et aL, 200 7 

95 Ceci & Williams, 2007. 
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Panel 11.5. The Lawrence 
Summers Affair 

On January 14, 2005, Lawrence Summers, 
at that time president of Harvard Uni¬ 
versity, gave a speech addressing the fact 
that women are underrepresented in the 
STEM fields, and even more underrep¬ 
resented at the top of those professions. 
Summers subsequently commented that 
he had been asked to be provocative. He 
was. 

Summers proposed three causes for 
the discrepancy: conscious or uncons¬ 
cious prejudice against women, women’s 
distaste for the intense professional com¬ 
mitments required to rise to the top of 
the STEM fields, and women’s difficulty 
in acquiring mathematical and scientific 
reasoning skills. Summers noted that a 
relatively small percentage of women 
achieve high scores on tests of mathemat¬ 
ical reasoning, and suggested that biolog¬ 
ical differences between men and women 
might contribute to the disparity in high- 
level mathematics skills. 

Several prominent women scientists 
left the meeting in protest. The high- 
visibility magazine Science published a 
letter signed by seventy-three prominent 
academics protesting his statements.* 
The Harvard faculty voted no-confidence 
in Summers, and in 2006 he resigned 
his post, returning to his position as 
a Professor of Economics. He hardly 
retired to pasture. In January of 2009 he 
was appointed Chair of the Presidential 
Council of Economic Advisors, making 
him the chief White House advisor on 
economic matters. 

Summers's remarks on women 
undoubtedly contributed to his more or 
less forced resignation, but it is unlikely 
that they were the only factors. By all 
accounts, Summers was a forceful exec¬ 
utive in an institution that was accus¬ 
tomed to a great deal of collegiality and 
decentralized decision making. During 
his tenure at Harvard he clearly ‘Tubbed 


a number of people the wrong way” as 
he sought what he perceived as urgently 
needed changes in an historic institu¬ 
tion.* As is often the case, the incentive 
to take drastic action stemmed from sev¬ 
eral causes. 

The politics behind Summers’s resig¬ 
nation are not relevant to our discussion 
of intelligence. The letter to Science is, 
because it indicates both the passions that 
are involved in discussions of group dif¬ 
ferences in intelligence and the beliefs 
held by highly influential people who 
have not studied the topic in depth. 

Here are two quotes from the letter to 
Science: 

There is little evidence that those scoring 
in the very top of the range in standard 
ized tests are likely to have more suc¬ 
cessful careers in science education .. . . 

And 

We are concerned by the suggestion that 
the status quo for women in science may 
be natural , ineintable, and unrelated to 
social factors. 

Muller et al, 2005, p. 1043 

Although the list of seventy-three sign¬ 
ers of the letter included prominent aca¬ 
demic scientists and science administra¬ 
tors, it did not include any of the major 
figures who do research on individual dif¬ 
ferences in cognition. 1 doubt that many 
of them would have signed the letter, for 
the first statement is demonstrably false. 
By 2005 the results of the Study of Math¬ 
ematically Precocious Youth (i.e., people 
whose SAT-M scores were in the top 1%) 
were well known to professionals in the 
field. These results document the stun¬ 
ning success of people whose scores were 
“in the very top of the range in standard¬ 
ized tests.” * 

What about the second statement? 
Summers never said anything about the 
status quo being natural, inevitable, or 

(continued) 
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Panel 11.5 (continued) 

unrelated to social factors. On the con¬ 
trary, he specifically listed two social 
factors that he thought contributed to 
the disparity: discrimination and conflicts 
between professional and family goals. 
The fact that Summers mentioned a pos¬ 
sible biological explanation for differ¬ 
ences in men's and women's intelligence 
was equated with a denial of social causes. 


Evidently the academic leaders who 
signed the Muller letter felt that they 
knew what the truth is, for they either 
felt no need to consult with experts in the 
field, who would have been easily avail¬ 
able to them, or they decided to disregard 
far more nuanced expert opinions. 

* Muller et al., 2005. 

* Bowley, 2006. 

* Park, Lubinski, & Benbow, 2006, 


Let us look at some arguments for social 
and biological causes. 

SOCIAL CAUSES 

Summers’ thought that one of the reasons 
that there are more men than women in 
the STEM fields is that work in these fields 
is simply more interesting to men than 
to women. If Summers were correct, men 
should have a higher participation rate than 
women in the STEM fields even within a 
population of talented men and women, 
where ability to enter the field is not an 
issue, but interest is. And there is such a 
group, the SMPY participants, people who, 
in their teens, were in the top 1% of their 
cohort in terms of academic ability tests. The 
SMPY participants were contacted twenty 
years later, when they were in their thirties, 
to see where their careers had gone. 

There were career differences between 
men and women. However, career choices 
were determined more by interest than by 
gender. It is a rough generalization, but not 
too far off the mark, to say that partici¬ 
pants who had primary interests in things 
and abstract ideas tended to follow careers in 
mathematics and the sciences. Participants 
who had interests in people and social issues 
followed careers in the humanitarian/social 
issues-oriented professions. Participants also 
differed in the extent to which they valued 
careers or families. Those participants who 
had strong family orientations were under¬ 
represented in the STEM professions, which 
are notoriously demanding of time. 


Men tended to fall more into the 
“things-ideas-career” pattern, while women 
tended toward the “people-family orienta¬ 
tion” pattern. 96 However, examples of each 
pattern occurred in both highly talented 
men and women. The codirectors of the 
SMPY, David Lubinski and Camilla Ben¬ 
bow, point out that difference between 
mens’ and womens’ interests alone would 
create disparities in the extent to which men 
and women choose to work in the STEM 
fields. 97 

The SMPY participants had acquired the 
cognitive talents they needed, but differed in 
interests. What about the more general issue 
of how men and women come to display 
disparities in mathematical aptitudes, and in 
language skills and visual-spatial reasoning? 

Cognitive skills improve with practice, 
a psychological law that Diane Halpern 98 
said is as certain as the law of gravity. [She 
was right.) Sex role differentiation is a fact 
in every society, although the restrictions 
placed on permissible gender roles have 
varied greatly both throughout history and 
across societies contemporaneously. There 
is considerable evidence that boys and 
girls do have different learning experiences 
that might impact on their acquisition of 
visual-spatial skills, and concomitantly of 

96 Benbow et al., 2000; Ferriman, Lubinski, & Benbow, 
2009; Lubinski et al., 2006; Wai, Lubinski & Benbow, 
2005. 

97 See Halpern et al., 2007, pp. 31-39, for a good dis¬ 
cussion of these social issues. 

98 Halpern et al., 2007, p. 4. 
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mathematics. A recent review" docu¬ 
mented the following statements: 

In American society parents are more 
likely to provide analytical, causal oriented 
explanations to boys than to girls. 

Middle school children generally accept 
the stereotyped view that females do not do 
well in mathematics and spatially oriented 
tasks. 

Male children begin to play and explore 
away from home at younger ages than 
girls do. 

Mathematics and science teachers tend 
to direct their discussions more toward boys 
than girls. 

Why do these differences arise? Is it 
because boys and girls draw forth differ¬ 
ent behaviors from the adults, or because 
the adults initiate these behaviors for either 
social or biological reasons? These are dis¬ 
tal explanations; the proximal facts are 
that when people are provided differential 
learning experiences they will learn differ¬ 
ent things. Psychologists cannot conduct an 
experiment to prove that this is a cause of 
differences in men’s and women’s perfor¬ 
mance on mathematical tasks - for people 
cannot be randomly assigned to lifestyles! 
There is an interesting natural observation 
that is consistent with the differential learn¬ 
ing explanation. 

In forty of the forty-one countries that 
participated in the PISA mathematics assess¬ 
ment program fifteen-year-old males out¬ 
performed fifteen-year-old females. 100 The 
sizes of the differences varied considerably. 
Nations also vary in the extent to which 
they differentiate between male and female 
social roles, including education. These dif¬ 
ferences are reflected in the World Eco¬ 
nomic Forum’s Gender Gap Index (GGI), 
which reflects the relative economic and 
social opportunities offered to men com¬ 
pared to those offered to women. The GGI 
correlates negatively (r = — .55) with the dif¬ 
ference between men and women’s scores 

99 See Hyde, 2007, for detailed references. 

100 National Center for Educational Statistics, Digest of 
Educational Statistics, 2008, Table 389. The excep¬ 
tion was Thailand. 


on the 2003 PISA tests of mathematical com¬ 
petencies in fifteen-year-olds. The closer the 
economic and social opportunities for men 
and women are to being equal, the smaller 
the male advantage in mathematics. 101 

The differential learning explanation for 
male-female discrepancies in visual-spatial 
reasoning and (perhaps as a derivative) in 
mathematics is a compelling one. However, 
it fails to explain a sometimes overlooked 
fact. Why is it that the discrepancy in mathe¬ 
matics appears on tests but not in class work? 

Two explanations have been offered - 
one that blames the tests and the other that, 
in a sense, blames society. 

The explanation that blames the test itself 
has two aspects. One is that the test ques¬ 
tions tend to be interpreted in different ways 
by men and women, and that the differ¬ 
ence in interpretation hurts women’s scores. 
To take an extreme example, suppose that 
all the word problems on a mathematics 
test used sports examples. If women were 
less familiar with sports than men are, they 
would be at a disadvantage not because they 
had lower mathematical skills but because 
they had difficulty mapping from the verbal 
statement of the problem to the appropriate 
mathematical model. 

This argument may have had some valid¬ 
ity at one time, but is unlikely to be valid 
today. All the major tests currently in use 
are subjected to elaborate statistical anal¬ 
yses to ensure that the tests measure the 
same underlying latent trait, whatever it is, 
in both men and women. This sort of error 
is far more likely to occur in a locally gener¬ 
ated class examination than it is in a national 
testing program. 

The second reason the tests could be 
blamed is that they might fail to measure 
(or fail to adequately weight) some trait 
that is required for performance in class¬ 
room mathematics, and that is possessed 
to a greater extent by women than by 
men. We have already encountered this 

101 Guiso et al., 2008. See supporting online material for 
details of the computation. Interestingly, there is no 
correlation between the GCI and the maleifemale 
variance ratios in different populations (Machin & 
Pekkarinen, 2008). 
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sort of objection, in discussing Lynn and 
Irwing's hypothesis that there is a differ¬ 
ence between test performance and class¬ 
room performance because women are more 
motivated to do well in class (or study more 
intelligently!). This seems to be likely. 

The explanation that blames society is 
that stereotype threat, as described in the 
introductory section of this chapter, may 
depress women’s test scores. The argument 
is that women think they will do poorly, 
and therefore give up more rapidly when 
faced with difficult mathematical problems. 
There is an elegant demonstration that this 
can happen. It involved a study of Asian 
women college students in a North Ameri¬ 
can university. 102 The women first answered 
questions designed to emphasize either their 
Asian identities or their identities as women. 
They then attempted to solve mathemat¬ 
ics problems. The idea was to produce two 
contrasting stereotypes - the belief that 
Asians do well in mathematics and the 
belief that women do poorly. The study 
worked; students who were induced to 
think of themselves as Asians did better 
than students who were induced to think 
of themselves as women, even though the 
assignment to different groups had been 
random. 

Experiments like this show that the 
stereotype threat effect exists. There are 
substantial questions about the size of 
the effect in situations in which the test 
has real consequences, such as a college 
entrance examination. In such situations 
examinees’ motivation to do well might 
override any anxiety introduced by stereo¬ 
type threat. The evidence is that it can, for 
reminding women of their gender has lit¬ 
tle effect on their performance on either 
high-level or low-level real examinations in 
advanced placement and community college 
classes. 10? If stereotype threat can be over¬ 
ridden by motivation in these situations, it 
seems logical that it would be overridden 
in even higher-stakes situations, such as col¬ 
lege entrance examinations. Direct evidence 

102 Shih, Pittinsky, & Ambady, 1999. 

103 Strickler & Ward, 2004. 


on this topic is extremely hard to obtain 
because it would not be ethical to set up a 
testing condition (the stereotype threat sit¬ 
uation) that was intended to lower the per¬ 
formance of one group of people relative 
to another. Statistical methods for detect¬ 
ing stereotype threat in high-stakes exami¬ 
nations without doing this have been sug¬ 
gested, but to my knowledge they have not 
been applied. 104 

All in all, it seems to me that the 
chief social reason for discrepancies between 
men’s and women’s performance in math¬ 
ematics, and the concomitant and perhaps 
related discrepancy in visual-spatial reason¬ 
ing, is simply a difference in interests. This 
difference causes girls to take fewer math¬ 
ematics courses than boys, on beyond the 
primary grades. Because we learn what we 
practice, adult women are less likely than 
men to have acquired high levels of math¬ 
ematical skill. Due to differences in inter¬ 
est, women are less likely than men to pur¬ 
sue careers in the STEM fields, even when 
they have developed the necessary cognitive 
skills. 

This conclusion leaves two questions 
open: why do women develop different 
interests than men, and are there biological 
reasons for the discrepancy? 

BIOLOGICAL REASONS FOR MALE- 
FEMALE DIFFERENCES IN COGNITION 
Evolutionary psychologists have offered a 
widely held distal biological explanation for 
differences in cognition between men and 
women - the “man the hunter, woman 
the gatherer” explanation. In every society 
there are sex role differences that go beyond 
the childbearing difference forced by biol¬ 
ogy. The evolutionary psychologists con¬ 
clude that these differences can be traced 
to an evolutionary advantage held by groups 
that, in prehistoric times, assigned the hunt¬ 
ing role to males and the gatherer-child care 
role to females. 

The story is that men in prehistoric soci¬ 
eties ranged widely as they hunted, and that 
those men who were better hunters had a 

104 Wicherts, Dolan, & Hessen, 2005. 
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reproductive advantage, either because their 
success gave them more access to females, 
or because their offspring were more likely 
to survive to reproductive age because good 
hunters could feed their children more reli¬ 
ably. This explanation also assumes that 
skills related to the R dimension of the g- 
VPR model, like the ability to judge the 
velocity of moving objects or to maintain 
orientation in space, aided in hunting. On 
the other hand, women are supposed to have 
primarily been gatherers of edible plants 
and small animals near the camp. A woman 
would have a reproductive advantage if she 
were a good gatherer, and so better able 
to feed her children. The final step is to 
assume that the ability to notice fine details, 
like an edible lizard in a bush or an edi¬ 
ble berry amid shrubbery, would make for 
a better gatherer. Women’s superior ver¬ 
bal abilities are explained by the assump¬ 
tion that women, being more dependent 
on others for protection of themselves and 
their offspring, had to be superior in social 
interchanges. 105 

The story is plausible. Slight reproductive 
advantages associated with various skills, 
acting over thousands of generations, could 
produce a substantial sexual imbalance in 
skills. However, the story is a story, not 
a fact. There is no direct evidence for it, 
because we know little, and probably never 
will know very much, about the behavioral 
characteristics of prehistoric Homo sapiens, 
let alone other hominid predecessors of our 
species. What we do know is inferred from 
indirect evidence, such as inferences about 
the rate of maturation of extinct hominids 
based on skeletal data, which then are used 
to infer the years required for protection 
of children, and then extrapolated to dis¬ 
cussions of male-female roles in mainte¬ 
nance of children, foraging, and group pro¬ 
tection. Analogies to the behaviors of exist¬ 
ing human hunter-gatherer societies are fre¬ 
quently used. 

Evolution certainly happened, and it 
could have happened this way. But the evi¬ 
dence is certainly, and of necessity, indirect. 

105 Geary, 2005, 2oo7a,b. 
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There is another evolutionary story, 
based on social rather than biological mech¬ 
anisms. The human brain is designed for 
general learning, a trait that is very useful 
in a species that occupies multiple ecologi¬ 
cal niches. Certain behavioral practices that 
produce better social and economic organi¬ 
zation can lead to a competitive advantage 
at the level of the group, rather than the 
individual. Groups that adopt such prac¬ 
tices are more likely to survive. 106 These 
practices include differentiation of male and 
female roles. Since humans are learners, 
males and females will acquire different, 
role-appropriate cognitive skills. The ten¬ 
dency will be accentuated over time, but the 
accentuation will be due to social rather than 
biological evolution. The difference in cog¬ 
nitive skills that appears in late adolescence 
is due to different learning experiences that 
have evolved historically, not physiologi¬ 
cal differences in brain structure that have 
evolved genetically. 

This hypothesis emphasizes social his¬ 
tory. It can be given a biological twist. It may 
be that boys learn more spatial skills because 
they explore more, and girls learn better 
verbal skills because they socialize more. 
This could be due to genetic influences on 
the tendency to explore each environment. 
Thus sex differences in cognition (and other 
differences between people) could be due 
to differential learning experiences, but the 
differential learning experiences themselves 
could be under genetic influence. 107 

I see no way to differentiate among these 
hypotheses. In fact, they are not mutually 
exclusive; all of them could have some truth. 
It is tautologically true that all human vari¬ 
ations in behavior will be within the range 
of variation permitted by the genome; the 
most dedicated geneticist does not deny 
that humans are superb learners. We have 
learned to live with this ambiguity with 
respect to physical behaviors. No one denies 
that males are genetically predisposed to 
be able to swing sticks more rapidly than 
females, and no one denies that today’s 

106 Campbell, 1975; Wilson & Wilson, 2007, 2008. 

107 Bouchard, 1999. 
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training methods routinely produce women 
tennis players who hit the ball harder than 
the men’s champions of yesteryear. 

The strict version of the genetic hypoth¬ 
esis, that men and women have evolved to 
have different cognitive abilities due to dif¬ 
ferent physiological structures, rather than 
to different learning experiences, implies 
that there may be some physical difference 
between modern men and women that can 
be related to cognitive differences. Has such 
a physical difference been found? 

Richard Lynn has pointed to one candi¬ 
date difference. 108 Men have bigger brains 
than women. After allowing for body size, 
men’s brain volumes average about 100 
cm 3 larger than women’s brain volumes. 109 
Within the sexes, brain volumes are posi¬ 
tively correlated with intelligence test scores 
(r ~ .35; see Chapter 7). Lynn argues that 
the difference in brain size accounts for the 
difference in general intelligence. He has 
amplified this argument by observing that 
girls mature faster than boys, so that the dif¬ 
ference in brain size does not appear until 
after adolescence. This parallels the devel¬ 
opment of the claimed difference in general 
intelligence. 

The explanation is appealingly simple and 
probably incorrect. 

The evidence for any substantial differ¬ 
ence between men and women in gen¬ 
eral intelligence is extremely weak. There¬ 
fore, Lynn’s argument leads to a paradox 
that, although not completely damning to 
Lynn’s line of reasoning, certainly makes it 
seem implausible. There are brain size dif¬ 
ferences of about 20 cm 3 between African- 
and European-derived populations in the 
United States, with the Europeans hav¬ 
ing larger brains after correction for body 
size. This is much smaller than the dif¬ 
ference in brain volume between men 
and women. However, the difference in 
test scores between African-derived and 
European-derived Americans, roughly 15 
points on the IQ scale, is four to five 
times larger than any reported difference 

108 Lynn, 1994, 1999; Colom & Lynn, 2004. 
logAnkney, 1992 


in test scores between men and women. 
This poses a contradiction, for it is not clear 
how a small brain size difference could pro¬ 
duce a large IQ difference between races, 
while a large brain size difference produced 
a small IQ difference between men and 
women. 110 

The fact that the male-female brain size 
discrepancy and the discrepancy in general 
intelligence (if it exists) develop in paral¬ 
lel is not a strong argument for causation. 
There are many changes in the difference 
between girls and boys from childhood to 
adolescence. These include both hormonal 
changes that alter brain action and changes 
in social roles and experiences. Picking out 
one of many bivariate correlations and offer¬ 
ing it as the explanation for a phenomenon, 
without a serious consideration of alterna¬ 
tive hypotheses, is not a compelling scien¬ 
tific argument. 

MALE-FEMALE DIFFERENCES IN BRAIN 
ORGANIZATION 

A fairly strong case can be made for the exis¬ 
tence of male-female differences in brain 
organization that are related to differences 
in cognition. 

Neural pathways can be divided into the 
myelinated white matter pathways and the 
unmyelinated gray matter pathways. The 
white matter pathways connect relatively 
distant centers of the brain, while the gray 
matter pathways deal with local connec¬ 
tions. There is a loose analogy to the dis¬ 
tinction between highways and city streets. 

Men have a higher proportion of white 
matter to gray matter than women do. 
In addition, women have somewhat larger 
connections between the hemispheres. This 
suggests that the male brain relies more 
on specialized processing centers in one or 
the other hemisphere than does the female 
brain, while the female brain spreads pro¬ 
cessing out across several possibly less pow¬ 
erful centers that perform the same or 
related functions. This is consistent with 
the observation that women have better 


110 Macintosh, 1998, p. 184. 
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prospects of recovering from localized brain 
damage than men. 111 

These ideas have been supported by 
imaging studies, which have shown that 
when men and women are faced with the 
same problems, such as solving progressive 
matrix items, they show somewhat differ¬ 
ent patterns of activation, and that perfor¬ 
mance correlates with anatomical measure¬ 
ments, such as white/gray matter ratios, in 
different places in the brain. 112 

The finding that men’s brains and 
women’s brains are, on the average, active 
in different places during problem solving 
excited a good deal of comment in the press, 
because the idea that “brain studies show 
that men and women think in different 
ways" understandably arouses a good deal 
of popular interest. The finding should not 
be overinterpreted. Brain activities during 
problem solving will depend upon the 
strategy that the problem solver takes. It 
has been shown experimentally, and is not 
logically too surprising, that an individual 
can alter his or her brain activity by altering 
his or her problem-solving strategy. 113 It 
does not follow that men and women must 
adopt different problem-solving strategies 
because their brains are different, although 
it may be the case that differences in their 
brains predispose men and women to adopt 
different strategies. This conclusion does 
not lead to sweeping generalities, but that 
is the way it is! 

Spatial orientation and navigation 
deserve special attention. These aspects of 
thought, part of the R in the g-VPR model, 
produce some of the largest, most consistent 
male-female differences in behavior. They 
also produce an interesting qualitative 
distinction. Men are more likely to orient 
themselves in space using geometric cues 
about the overall environment, while 
women are more likely to use landmark and 
route information. 114 The same difference in 

in See Gur & Gur, 2007, and Kimura, 1999, Chap¬ 
ters 10 and 11, for further references and discussion. 

112 Haier, Jung, et al., 2004, 2005. 

113 Reichle, Carpenter & Just, 2000. 

114 Hunt, 2002, Chapter 6; Maguire, Burgess, & 

O'Keefe, 1999. 
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behavior is found when comparing differ¬ 
ences in the way in which male and female 
rats explore unfamiliar environments. 11 * The 
rat behavior has been associated with neural 
activation in the hippocampus and related 
structures deep in the medial temporal 
region of the brain. There are suggestions 
of differences in hippocampal structures 
between men and women, but the data is 
not yet definitive. As in the case of problem 
solving, orientation and navigation can 
be achieved by different strategies, and 
the strategy used will determine the brain 
region involved. 

To summarize, there are differences both 
in the structures of male and female brains 
and in the ways in which the brain is used 
in different aspects of problem solving. The 
differences seem to be especially large when 
the problem to be solved has a spatial-visual 
component. But what causes these struc¬ 
tural differences and differences in activity 
in the first place? The answer to this turns 
out to be genetic potential (of course!] inter¬ 
acting with hormonal balance during key 
periods of brain development. 

The evidence for this comes from two 
sources: studies on nonhuman animals and 
studies of abnormal human development. 

Manipulation of adrenal levels prenatally 
and postnatally can influence the display of 
typical male or typical female behavior in 
rats, including the extent of engagement in 
rough-and-tumble play and the patterns of 
behavior in maze exploration. Obviously, 
conducting an analogous experimental study 
on humans would not be ethical. What we 
can do is to study certain medical conditions 
in which unusual hormonal concentrations 
occur. 

Congenital adrenal hyperplasia (CAH] is 
a genetic condition in which the adrenal 
gland fails to generate a key enzyme, causing 
unusual sensitivity to male hormones. The 
condition can occur in both boys and girls. It 
is treated by restoring the normal hormone 
balance. Female CAH patients tend to have 
higher scores on spatial orientation tests 
than normal females. Males with CAH (a 

115 Ohnishi et al., 2006. 


406 


HUMAN INTELLIGENCE 


less-studied group) tend to have lower scores 
than normal males. There are indications 
that this result generalizes to other behavior 
patterns in women, for female CAH patients 
display more masculine behaviors and inter¬ 
ests than do normal girls and women, includ¬ 
ing such things as preferences for “male- 
appropriate” or “female-appropriate” toys. 116 

These results refer to effects of hormones 
on the developing brain. There is also evi¬ 
dence that circulating hormonal levels in 
adults will influence human cognition. The 
results are somewhat inconsistent, as the 
studies are difficult to do and generally 
involve small numbers of participants, a con¬ 
dition that invites null findings. Neverthe¬ 
less, certain results seem to be reasonably 
well established. 

In women, high levels of circulating estro¬ 
gens facilitate tasks involving verbal fluency 
and/or short-term memory. The evidence is 
mixed for performance on visual-spatial rea¬ 
soning, except for a consistent reduction in 
performance on mental rotation tasks. This 
has been established by two sources of data: 
studies of women tested at various times 
during their menstrual cycle, and studies 
of post-menopausal women who either are 
or are not receiving estrogen replacement 
therapy. 117 

Complementary results have been found 
in studies of testosterone. Testosterone ap¬ 
pears to have a non-monotonic effect on spa¬ 
tial reasoning, enhancing spatial reasoning 
in women and men with low testosterone 
(a common condition in the elderly), but 
decreasing spatial reasoning abilities in men 
with normal or high testosterone levels. The 
cognitive effects are complicated by the fact 
that circulating testosterone levels are asso¬ 
ciated with a myriad of other effects, includ¬ 
ing increases in impulsivity and aggressive 
behavior. To get some idea of the com¬ 
plexity of the effect, contrasting effects of 
testosterone on spatial rotation have been 
reported in England, the United States, 

116 Berenbaum & Resnick, 2007; Kimura, 1999, Chap¬ 
ter 9; Puts et al., 2008. 

117 Haussmann et al., 2000, and references therein; 

Halpern 81 Tan, 2001. 


and China. 118 The authors speculate that this 
is because of different emphases on speed 
versus accuracy of response in different cul¬ 
tures. The more general point is that in 
spatial-visual problem solving (which can be 
quite difficult) a hormonal effect could be on 
either the brain mechanisms required for the 
task itself or the brain mechanisms involved 
in selecting a problem-solving strategy. 119 

11 . 3 . 7 . A Summary but No Answers 

There is no simple answer to the ques¬ 
tion “What causes male-female differences 
in cognition?” There are biological differ¬ 
ences between men's and women's brains, 
and these differences influence both verbal 
fluency and, to a somewhat greater degree, 
spatial orientation, way finding, and the 
manipulation of visual images - the V and 
R dimensions of the g-VPR model. How¬ 
ever, biology is not destiny. Just as there 
is no doubt that there are biological pre¬ 
dispositions that operate to produce a femi¬ 
nine or masculine trend in thought, there are 
social influences that can either accentuate 
or override these trends. Given motivation, 
humans are powerful learners, and learning 
itself influences brain organization. 

Confused? You should be. Just as Diane 
Halpern said, the more you learn about 
sex differences in cognition, the less certain 
you are that you know the answers. Within 
very broad limits, biological and social influ¬ 
ences produce tendencies toward certain 
cognitive and social behaviors. Once this is 
done, the behaviors themselves alter biolog¬ 
ical makeup and social status. Women, on 
the average, are somewhat more verbal and 
somewhat less spatially oriented than men, 
but there are male chatterboxes and female 
helicopter pilots. 

There is one question that we can 
deal with. Are men more intelligent than 
women? The answer is clear. 

It is the wrong question. Men and women 
have somewhat different brains. This leads 


118 Yang et al., 2007. 

119 Kimura, 1999, Chapter 9. 
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to somewhat different strengths and weak¬ 
nesses in cognition. Any battery-type intelli¬ 
gence test will, by definition, consist of a mix 
of tasks, some that favor the feminine style 
of thinking and some that favor the mas¬ 
culine style. The particular mixtures found 
in battery-type tests, such as the WAIS, 
have been determined more by history than 
by any analysis of why one mix is better 
than another. There have also been power¬ 
ful commercial forces that work against try¬ 
ing out any test battery that is too different 
from its predecessors. We should not disre¬ 
gard present batteries, and any male-female 
differences built into them, but we should 
not regard them as set in stone. 

One could construct test batteries that 
favor males, favor women, or are neutral. 
There would be no way to say which of 
these mixes was best without specifying the 
purpose of the test. A test battery for the 
selection of pilots will not and should not 
look like a test battery for the selection of 
law school students. 

Any individual test, including the much- 
used matrix tests, will evaluate g and test- 
specific features, a point that Spearman 
made a century ago. You can find a mea¬ 
sure of general intelligence that is g and ver¬ 
bally loaded, and produce an advantage for 
females, or produce a measure of general 
intelligence that is g and loaded on spatial- 
visual reasoning, and find an advantage for 
males. Depending on the purpose for which 
the test is to be used, one or the other solu¬ 
tion could be the correct one. 

It is sensible to try to understand male- 
female differences in cognition. It is not sen¬ 
sible to ask whether men are smarter than 
women. 


11.4. Race and Ethnicity 

This section will deal with racial and eth¬ 
nic differences in intelligence. The topic is 
probably the most controversial one in psy¬ 
chology. Jerry Carlson and I identified the 
following positions that have been taken: 120 
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1. There are no differences in intelligence 
between racial and ethnic groups. 

2. There are large differences in intelli¬ 
gence between racial and ethnic groups. 

3. Differences in intelligence between 
groups exist and are social in origin. 

4. Differences in intelligence between 
groups are genetic in origin. 

5. There is no such thing as race. 

The facts concerning racial and ethnic dif¬ 
ferences in IQ and similar test scores are 
clear. The causes and implications of these 
facts are not at all clear. Therefore, the sci¬ 
entific discussion can easily slip over into a 
heated dispute. The discussion and the dis¬ 
pute cannot be kept entirely apart. How¬ 
ever, I believe (hope) the discussion here 
will shed more light than heat. To this end 
I will first spend some time setting up the 
problem and identifying the key variables, 
prior to presenting a picture of what I think 
is known. Being hyper-careful about the def¬ 
inition and scope of a subject may be some¬ 
what boring, but in this case precision of 
thought is essential. 

In previous discussions of the conceptual 
nature of intelligence, beyond the opera¬ 
tional definition of a test score, the following 
points were made: 

1. Although the requirements for intelli¬ 
gence vary somewhat across societies, 
there is a common core - for exam¬ 
ple, the ability to comprehend verbal 
arguments or to maintain orientation in 
space. 

2. Intelligence tests and their analogs, 
such as personnel classification tests, 
are assessments of the cognitive skills 
required for success in industrial and 
post-industrial society. The test scores, 
although not perfect indicators of intel¬ 
ligence, are relevant in that society. 
They will be relevant to success in other 
societies to the extent that the tests tap 
the cognitive skills relevant for success 
in the target society. The extent of rel¬ 
evance has to be determined on a case- 
by-case basis. 


120 Hunt & Carlson, 2007a. 
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3. Evaluating the sorts of skills required 
by the post-industrial societies is impor- 
tant ; in itself. These societies dominate 
the globe now and will do so for the 
foreseeable future. Learning about intel¬ 
ligence as it is expressed in London, 
Madrid, or New York will have more 
impact on the human condition than 
learning about the expression of intel¬ 
ligence in small groups in the forests of 
South America. This is not to say that 
anthropological studies are uninterest¬ 
ing; they can be very informative. The 
point is just that even if it were shown 
that intelligence, as tested, is wholly 
confined to the post-industrial societies, 
the topic would still be important. 

4. All cognitive skills are ultimately deter¬ 
mined by biology. However, it is impor¬ 
tant to distinguish between genetic con¬ 
tributions, which establish potentials, 
and life experiences, which determine 
the extent to which a potential is real¬ 
ized. This can be very difficult to do. 
Measuring the size of the genetic contri¬ 
bution to variance in a particular society 
or subsociety is not the same as under¬ 
standing it. 

Do races and ethnic groups exist? It has 
been claimed that these are solely social dis¬ 
tinctions. Because the definition of a group 
varies from time to time and place to place, 
it is argued, they cannot be biological nat¬ 
ural kinds. Therefore, racial differences are 
not amenable to scientific study because the 
basic concept of race is ill-defined. 121 This 
argument has been buttressed by the fact 
that genetic diversity within groups is larger 
than genetic diversity between groups. 122 

The counterargument is that in fact, peo¬ 
ple identify and are identified with racial and 
ethnic groups. These distinctions are highly 
correlated with a clustering of social and 
biological variables. Socially, for instance, 
68% of the residents of the United States 
who identify themselves as Hispanic are 
Catholics. Only 20% of the non-Hispanic 

121 Fish, 2002; Smedley & Smedley, 2005. 

122 Lewontin, 1974. 


residents are Catholic. 123 Therefore, Catholi¬ 
cism is statistically associated with ethnic 
identity, even though there are Hispan- 
ics who are not Catholic and Catholics 
who are not Hispanic. The same reason¬ 
ing applies to genetic variation. While it is 
true that within-group genetic variation is 
greater than between-group variation, the 
amount of genetic variation between groups 
is quite sufficient, statistically, to make accu¬ 
rate assignments associating an individual 
with one of the three largest groups of ori¬ 
gin in the United States - African, European, 
and East Asian. 124 There is also a high level 
of agreement between self-identification as 
White, Black, East Asian, or Latino and 
assignment of a person to clusters based 
upon similarities of their genomes. 125 While 
there is no one defining characteristic, other 
than self-identification, that assigns some¬ 
one to a particular racial/ethnic group, iden¬ 
tification of a person as a member of a 
racial/ethnic group will probabilistically tell 
us something about that person’s standing 
on a variety of social and biological variables. 

The key word is “probabilistic.” Artifi¬ 
cial Intelligence researchers have developed 
the idea of a “fuzzy” concept, a class in 
which membership is determined to varying 
degrees rather than being a binary property. 
The political concept conservative is a good 
example. An individual may espouse con¬ 
servative beliefs with an intensity that varies 
with the topic. Race and ethnic member¬ 
ship are fuzzy concepts. There are no defin¬ 
ing characteristics of a racial or ethnic group, 
but there are statistical tendencies. 

The de facto definition of races and eth¬ 
nic groups - that is, the extent to which 
a person has to possess the (fuzzy) prop¬ 
erties in order to be designated a group 
member - varies over time and place. In 
modern American categorization the term 
“Asian-American” includes genetic Japanese 
who are fourth-generation Americans, who 
do not speak Japanese and have never been 
to Japan, along with green card holders 

123 New York Times, 26 April 2007. 

124 Bamshad et al, 2004. 

125 Tang et al., 2005. 
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who just landed from Cambodia and speak 
marginal English. Similar examples could be 
offered for every other large racial and eth¬ 
nic group in North America and Europe. 
Such variety means that for some purposes 
the classification of individuals into broad 
racial/ethnic groups is nearly useless. For 
other purposes it makes sense. 

Knowledge about covariation between 
racial/ethnic identification and important 
social outcomes, such as health status, can 
serve as clues to tell us what variables to 
look at as we attempt to understand causal 
relationships. For instance, it is empirically 
the case that high blood pressure is more 
common in the African American than in 
the White community. 126 What is it in the 
African American genetic constitution or 
social practice that leads to high blood pres¬ 
sure? Questions like this ought to be inves¬ 
tigated by the biomedical sciences. 127 Simi¬ 
larly, if there is a discrepancy in intelligence 
test scores between the African American 
and other American communities (and there 
is), seeking to understand the reasons for 
and implications of the discrepancy are legit¬ 
imate issues for the psychological sciences. 

Research on racial/ethnic distinctions can 
inform policy making. Most of the devel¬ 
oped societies do make policy decisions 
about racial/ethnic groups. The United 
States, for instance, has developed an elab¬ 
orate legal doctrine aimed at the prohibi¬ 
tion of discrimination. Two American psy¬ 
chologists, A. & B. D. Smedley, writing 
in the American Psychologist , a high-impact 
journal, vehemently rejected the idea of 
race as a biological concept, and at the 
same time urged the government to con¬ 
tinue to compile statistics on social inequal¬ 
ity between races (and, by extension, ethnic 
groups) for the purpose of eliminating such 
inequalities. 128 

Such a position contains a contradiction. 
Policy makers should be informed of rel¬ 
evant scientific information, which should 

126 Information downloaded from www.americanheart 

.org/presenter.jhtml?identifier=462i, January 2009. 

127 See Bamshad et al., 2004, for further examples. 

128 Smedley & Smedley, 2005. 


be considered along with other things when 
policy decisions are made. In the case of 
racial/ethnic differences, the scientific infor¬ 
mation needed can be obtained only by doc¬ 
umenting the existence of ethnic differences 
in intelligence, understanding the implica¬ 
tions of these differences, and inquiring into 
all possible causes, both social and biologi¬ 
cal. The Smedleys worried, with some rea¬ 
son, that such information could be used to 
justify discriminatory practices. This could 
happen; but if it does, it will be the result of 
political decisions combined with scientific 
findings, not the scientific findings them¬ 
selves. I strongly believe that it is better 
to proceed with information rather than to 
proceed in ignorance. 

In practice, investigators who study racial 
and ethnic differences face major obstacles. 
These obstacles have had an impact on the 
quality of research in the field. Let us see 
why. 

11 . 4 . 1 . Issues of Quality in Research 
on Race and Ethnicity 

Racial and ethnic distinctions are both 
fuzzy, in the sense just defined, and incon¬ 
sistent over time and place. Brazilians intro¬ 
duce distinctions that Americans do not, 
referring to people as white and black, but 
also as Moreno, mestizo, and pardo. The 
terms Indian and Native American are used 
today much more broadly than they were 
in the past. Given that the answer to the 
question “who is what?” changes over place 
and time, it is no wonder that it is hard to 
find how a behavioral capacity, such as intel¬ 
ligence, correlates with race and ethnicity. 

Race and ethnicity are seldom causal vari¬ 
ables in themselves. The only exceptions are 
in studies of the reactions of others to a 
person of a given race, and studies of the 
effects of self-identity on people’s concep¬ 
tions of themselves. In these situations the 
application of a label may itself be a stim¬ 
ulus that leads to action. Outside of such 
situations, a researcher who studies a corre¬ 
lation between racial/ethnic status and a test 
score is studying the correlation between the 
score and the many possibly causal variables 
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correlated with race and ethnicity. These 
include educational status, biological dis¬ 
tinctions, health and wealth, and a variety of 
beliefs and behavioral practices. This com¬ 
plicates the assignment of causes for indi¬ 
vidual differences. The bivariate argument 
that “because racial/ethnic groups differ on 
possible cause X and outcome Y, cause X 
must produce group differences on Y” is an 
example of seizing on a bivariate correla¬ 
tion in the presence of collinearity. Such 
arguments should be regarded with great 
suspicion. 

Differential recruitment is also a prob¬ 
lem. In particular, college students, the 
default sample of convenience, are easy to 
study. However, college students are not a 
random sample of any racial or ethnic group, 
which makes generalizations tricky. 

Finally, there are social (and sometimes 
legal) pressures restricting research on racial 
and ethnic groups. While I may think that 
the need to overcome ignorance overrides 
the possibility that a finding on race or eth¬ 
nicity may be used to justify discrimination, 
there are other opinions. There are audi¬ 
ences who believe that anyone who reports 
racial/ethnic differences must be a racist, 
and concomitantly that anyone who reports 
a finding questioning the existence of racial 
and ethnic differences in intelligence must 
be on the side of the angels. There are also 
audiences who are longing to hear about 
racial differences in intelligence, because 
they want to use such findings to justify dis¬ 
criminating against, or not providing affir¬ 
mative action for, an ethnic group. This has 
led both to a difficulty in getting funds to do 
good research in the field and to a tempta¬ 
tion to oversimplify a finding, in order to tell 
different audiences what they want to hear. 

I am not alone in my concerns. In the 
1970s Lindzey, Loehlin, and Spuhler closed a 
widely respected review of racial difference 
in intelligence by saying 

The design, execution, and reporting of 
studies of racial-ethnic differences in intel¬ 
ligence often leave much to be desired. 

The conclusions that we have attempted 
to draw from these data are often limited 


by this fact. We have been concerned 
privately by the number of instances in 
which the political and social preferences 
of the investigators apparently have grossly 
biased their interpretations of the data. 
Such distortions appear to be at least as 
prevalent at environmentalist as at heredi- 
tarian extremes. 

Lindzey, Loehlin, & Spuhler, 1975, 
pp. 232-233 

Writing over thirty years later, I am sorry 
that I have to say that Loehlin, Lindzey, 
and Spuhler are still correct. Bad research 
is itself a fuzzy concept. Very few studies 
are uniformly good or bad. In this field, more 
than in others, it is important to consider the 
strengths and weaknesses of a study before 
generalizing its findings. Panel 11.1 presented 
some cautions about research on group dif¬ 
ferences. I urge readers to consider their 
impact when evaluating a study on racial and 
ethnic groups, reported here or anywhere 
else. 

11 . 4 . 2 . Coverage and Terminology 

For the most part, I will be concerned with 
the four major racial/ethnic groups in the 
United States, African Americans, Asian- 
Americans, Hispanics, and Whites. On occa¬ 
sion studies from other countries will be 
considered. These are the groups for which 
the most data has been gathered. In addition, 
these groups live in a large, post-industrial 
society; the vast majority of their members 
speak a common language, English, either as 
a first language or a strongly held second lan¬ 
guage; and they have generally comparable, 
although certainly not identical, living stan¬ 
dards, compared to the contrast between, 
say, Whites in the United Kingdom and 
Africans in Somalia. 

The four American groups have all the 
characteristics of a fuzzy class, defined on 
either social or biological variables. There is 
no one variable, or even one group of vari¬ 
ables, that defines group membership. How¬ 
ever, statistical clusterings based on either 
social or genetic variables do correspond 
closely to self-identification with a group. 

In addition, these groups have a political 
identity. Laws, regulations, and government 
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programs exist that distinguish among them 
in a variety of ways. As a consequence, 
the government keeps extensive statistics 
on characteristics of their health, education, 
and economic status. It is often important to 
consult these statistics in order to untangle 
issues due to collinearity. 

Finally, a word about terminology. 

Anyone who attempts to name racial/ 
ethnic groups in the United States is shoot¬ 
ing at a moving target. W. E. B. Du Bois 
(1868-1963), a leading social activist of the 
1900-50 period and one of the founders of 
the National Association for the Advance¬ 
ment of Colored People (NAACP), referred 
to himself as a Negro. The term is proscribed 
today. The term “colored people” is part of 
the NAACP's name, but is now also pro¬ 
scribed. However, the term “people of color” 
is appropriate, at least as of 2010. 

I will use the terms African American and 
Black interchangeably unless, of course, I 
am referring to a group outside the United 
States. In present US government docu¬ 
ments Hispanic or Latino is used to refer to 
US residents who are immigrants from, or 
descendants of immigrants from, a Spanish- 
or Portuguese-speaking country in the West¬ 
ern Hemisphere. The term is also used for 
descendants of the Spanish/Mexican people 
who settled in California and the South¬ 
west prior to the 1840s. Surprisingly, the 
government does not include in this term 
people whose family origin was Spain or 
Portugal, although this policy seems to be 
applied inconsistently. I will follow govern¬ 
ment usage, as it exists in 2010. 

While I shall have to use the term Asian- 
American at times, especially when refer¬ 
ring to government records, I will attempt 
to refer to more specific groups, such as the 
Japanese, as I believe that the current official 
designation Asian-American is far too broad. 

White will be my catch-all term for all 
other groups. These are primarily Ameri¬ 
can residents of European descent. There are 
substantial communities who have cultural 
and genetic ties to Armenia, Georgia, Iran, 
Israel, and other Middle Eastern nations, but 
they are all European-Americans to the US 
census! When appropriate I will refer to 


specific communities within the broad Asian 
and White designations. 

11 . 4 . 3 . Test S core Gaps between Whites, 
African Americans, and Hispanics 

There is a long history of studies of racial/ 
ethnic differences in test scores. In their 1975 
review Lindzey, Loehlin, and Spuhler, citing 
earlier data by Yerkes, reported that in 1917- 
18, during World War I, the mean score on 
the Army tests for White recruits was 1.16 
standard deviation units above the mean for 
African American recruits. Studies of enlis¬ 
tees in World War II and the Vietnam War 129 
showed a 1.52 mean difference in favor of 
Whites. Lindzey and colleagues were care¬ 
ful to point out that this is not evidence that 
the Black-White differences in intelligence 
had increased from 1918 to the 1960s, because 
military enlistees are not a representative 
sample of the country, and because different 
recruitment/conscription standards were in 
effect in the two wars. However this is cer¬ 
tainly not evidence for a presumed reduction 
in the difference! 

In order to make a comparison between 
the scores of different groups we need to 
have data from a representative sample of 
the national population. Table 11.4 presents 
the results from several such surveys involv¬ 
ing battery-type tests. There is some vari¬ 
ety in the results, but not a great deal. The 
African American means are about 1 stan¬ 
dard deviation unit (15 points on the IQ 
scale) below the White means, and the His¬ 
panic means fall in between. 

A similar picture is obtained from 
comparisons involving the Raven Pro¬ 
gressive Matrices (RPM) tests. Figure 
11.21 shows the median RPM test score 
obtained in a school district in the west¬ 
ern United States, as a function of age 
and racial/ethnic group. 130 We see the 
same picture reflected in the scores on 
battery-type tests. Whites outscore African 

129 Lindzey, Loehlin, & Spuhler, 1975, p. 143. 

130 Raven, 2008a. J. Raven has recommended reporting 

RPM scores as percentiles, rather than in terms of 

summary statistics, such as means and variances. See 

Raven, 2008b, p. 60 (note 1.55), for a justification. 
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Table 11.4. Values of (White mean - African American mean) and (White mean - Latino 
mean) in standard deviation units for a variety of cognitive tests, using cases where a 
reasonably large standardization sample has been obtained 

African 


Population 

Test Used 

American 

Latino 

Reference 

WAIS III adult 
standardization 

WAIS General 

Ability Index 

•95 

■65 

Lange et al., 2006 

NLSY79, men 

AFQT 

1.07 


Scullin et al., 2000 

NLSY79, women 

AFQT 

1.00 


Scullin et al., 2000 

NLSY79, all groups 

AFQT 

1.2 

•93 

Herrnstein & 
Murray, 1994, 
pp. 275, 278 

Standardization sample 

Wide Range 

Intelligence Test 

.85 

■ 5 1 

Shields, Konold, 

8c Glutting, 2004 

Standardization sample 

Woodcock-Johnson 3: 
General Intellectual 
Ability 

1.05 


Murray, 2007 


Americans, and Hispanic scores fall some¬ 
where in between. 

In order to avoid recruitment effects, 
Table 11.4 cites studies using relatively 
large samples, where an attempt was made 
to obtain a sample representative of a 
defined population. The samples involved 
cover a wide range of people, from the 
five to sixty-five years age range in the 
Woodcock-Johnson standardization sample 
to the schoolchildren studied in the RPM 
standardization. Similar results can be found 
by averaging over the more than 150 studies 
that have used convenience samples. 131 

Similar differences are found internation¬ 
ally. Historically there have been a num¬ 
ber of studies comparing Whites to other 
racial/ethnic groups in a variety of countries. 
Because there have been major changes in 
the economic and health status of many 
developing countries, the best course is 
probably to look at the recent literature 
rather than at that of over thirty years ago. 

J. P. Rushton, Lynn, and a number of 
their colleagues have conducted a wide- 

131 Herrnstein & Murray, 1994, p. 277; Jensen, 1998, 
p. 354. The Herrnstein and Murray citation gives 
references to specific studies. Jensen’s citation does 
not, but it apparently refers to an analysis that he 
conducted. 


ranging series of studies in which they 
use the Raven Progressive Matrices tests 
to evaluate group differences within var¬ 
ious countries. All obtain the general 
results observed on the Raven standardiza¬ 
tion. Whites do better than Blacks, with 
other ethnic groups somewhere in between. 
The studies involved include a Roma- 
Serbian contrast in the Balkans/ 32 White- 
Indian-mixed race-Black contrasts in South 
Africa/ 33 and a contrast between Whites, 
Mestizos (mixed White-Native American), 
and Native Americans in Mexico. 134 In all 
these studies Whites and Asians obtain 
the highest scores, and Blacks the lowest, 
with other racial/ethnic groups falling in 
between. 

The Mexican study provides a good 
example of this work, because it is some¬ 
what more extensive than several of the 
other studies. Lynn and colleagues tested 
elementary school children, aged seven to 
ten, near Ensenada, in the state of Baja Cali¬ 
fornia. The ordering of means was what the 
experimenters had anticipated: Whites (IQ 
equivalent ~ 100), Mestizos (mixed Native 

132 Rushton, Cvorovic, & Bons, 2007. 

133 Rushton & Skuy, 2000; Rushton, Skuy, & Fridjohn, 

2003. 

134 Lynn, Backhoff, & Contreras, 2005. 
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Figure 11.21. Raven Progressive Matrices scores in a school district 
in the western United States, as a function of age and racial/ethnic 
group. Data taken in the 1980s. Source: J. Raven, 2008b, Table 8.3. 


American-White groups) (IQ equivalent ^ 
95), and Native Americans (IQ equivalent^ 
83). (See Figure 11,22.) The authors point out 
that these results resemble those obtained 
in the United States, where Mexican immi¬ 
grants have scores below Whites, and Native 
Americans tend to have still lower scores. 
The differences were substantial. Seven- 
year-old White and Mestizo children solved 
progressive matrix problems at a level not 
obtained by Native American children until 
they are nine or ten. 

Lynn and colleagues’ results for Mexi¬ 
cans in Mexico can be compared to Raven’s 
results for Latinos in the southwestern 
United States. This is done in Figure 11.23, 
which shows a striking continuity in changes 


of test scores across ages, within each ethnic 
group. 

Because the Raven tests are often referred 
to as measures of g, there is a temptation 
to interpret these results as showing that 
White populations possess general intelli¬ 
gence to a greater degree than nonwhite 
populations living in close proximity to the 
White groups. As Lynn himself has indi¬ 
cated, it is not appropriate to draw such 
a conclusion based on results from a sin¬ 
gle test. However, similar results implicat¬ 
ing differences in g have also been found 
in European studies that used batteries of 
subtests designed to evaluate narrow cogni¬ 
tive functions. The populations compared 
included children from different immigrant 



Figure 11.22. The median number of Raven's Standard Progressive 
Matrices problems solved by Native American, Mestizo, and White 
children in Mexico. Data from Lynn, Backhoff, & Contreras, 2005, 
Table 2. 


























44 


HUMAN INTELLIGENCE 



6 7 8 9 10 11 12 13 14 15 16 17 

Age of Children 


—*— Lynn White 
Raven White 
Lynn Mestizo 
Raven Hispanic 


Figure 11.23. Median number of problems solved as a function of 
age and ethnicity. Data from Raven, 2008a, Table 8.2 (half-year 
intervals) and from Lynn, Backhoff, & Contreras, 2005, Table 2. 


groups in Europe and adult immigrants from 
the Netherlands Antilles compared to native 
Dutch employees in the railway system in 
the Netherlands. 135 

11.4.4. ^ Closer Look at the Nature 
of Racial/Ethnic Differences 

In the middle of the twentieth century a 
study was done on variations in first-grade 
children’s intelligence that were associated 
with ethnic status and social class (SES). 136 
The authors concluded that the level of 
intelligence was associated with SES, and 
that there were patterns of differential abil¬ 
ity associated with ethnic groups. Asians 
were said to have high spatial ability, and 
Jewish children to have high verbal abili¬ 
ties. Since that time there have been sev¬ 
eral efforts to determine the nature of the 
differences in the intelligence of various 
racial/ethnic groups, beyond the omnibus 
statement that test scores vary from group 
to group. 

Most of these studies have been presented 
as investigations of “Spearman’s hypothesis.” 
The strong version of this hypothesis is that 
all intergroup differences in intelligence are 
due to differences in general intelligence. 
The weak form is that the majority of these 
differences are due to general intelligence, 

135 Helms-Lorenz, van de Vijver, & Poortinga, 2003; te 
Nijenhuis et al. ( 2004. 

136 Lesser, Fifer, and Clark, 1965. 


but that differences in lower-order factors 
(e.g., verbal and spatial-visual reasoning) 
may also contribute to group differences. 137 

Jensen summarized a number of studies 
testing Spearman’s hypothesis that had been 
carried out through about 1995 using the 
method of correlated vectors. He concluded 
that the correlation between test loadings on 
a g factor and the Black-White differences 
in test scores is about .60, and considerably 
higher in some tests. 138 Jensen interpreted 
this as substantial support for the weak form 
of Spearman’s hypothesis. He further con¬ 
cluded that spatial-visual reasoning, which 
tends to show fairly large Black-White dif¬ 
ferences, was responsible for the remaining 
differences. 

Subsequently the Danish psychologist 
Helmut Nyborg collaborated with Jensen on 
a large study of African American-White 
differences among Vietnam veterans. 139 
Because conscription was used during the 
Vietnam War the study participants were 
somewhat representative of men from the 
cohorts born in the United States dur¬ 
ing the 1940s and early 1950s. Nyborg and 
his colleagues updated Jensen’s review and 
published further data testing a White- 
Hispanic contrast in the Vietnam veteran 
population. 140 These studies also reported 

137 Jensen, 1998, p. 372. 

138 Ibid., pp. 376 ff. 

139 Nyborg & Jensen, 2001. 

140 Hartmann, Kruuse, & Nyborg, 2007. 
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support for the weak form of the hypothesis, 
that racial/ethnic differences in cognitive 
skills are generally due to differences in g, 
but that other traits may be involved. 

Nyborg and Jensen’s conclusions, and 
many similar conclusions about differences 
between racial/ethnic groups, rest on the 
validity of the method of correlated vectors. 
Several questions have been raised about the 
technique. These are described in Panel 11.6. 
One of the most serious criticisms is that 
the method of correlated vectors tests a sin¬ 
gle hypothesis against “chance,” the possibil¬ 
ity that the correlation between g loadings 
and group differences would be produced 
by random assignment of numbers. A bet¬ 
ter question is whether Spearman’s hypoth¬ 
esis fares well against competing hypothe¬ 
ses. Where a comparison has been made, the 
results have been equivocal. In the study of 
immigrant children cited earlier, the investi¬ 
gators also applied the method of correlated 
vectors, but instead of using g loadings they 
used “cultural loadings" assigned by having 
a panel of graduate students rate whether 
or not particular subtests were culturally 
loaded. Aggregated cultural ratings had a 
markedly higher correlation with group dif¬ 
ferences than the g loadings did. 141 

The conclusion that racial/ethnic group 
differences are primarily due to differences 
in g has been repeated several times in 
the secondary literature, in spite of the 
questions raised by Dolan’s analyses. The 
Danish investigators who summarized the 
evidence after Jensen’s 1998 summary did 
not even mention Dolan’s work. Some¬ 
times the summarizers have actually mis¬ 
stated Dolan’s conclusions! When Rushton 
and Jensen published what they regarded as 
a summary of fifty years of studies of White- 
Black differences, they had this to say about 
Dolan’s work: 

The results statistically confirmed the con¬ 
clusion derived from the method of cor¬ 
related vectors regarding a "weak form” 

141 Helms-Lorenz, Van de Vijver, & Poortinga, 
2003, Table 7. See particularly the lower right- 
hand entries, which summarize the complicated 
analyses. 


of Spearman’s hypothesis: Black-White 
group differences were predominantly on 
the g factor , although the groups also 
showed differences on some lower order fac¬ 
tors (e.g., short-term memory and spatial 
ability) independent of g. 

Rushton Jensen, 2005, p. 248 

Here is what the original authors said to 
summarize their work: 

It is concluded that the Spearman correla¬ 
tion, as a test of the importance of gin B-W 
differences, lacks specificity. The results of 
the MGCFAs suggest that it is very difficult 
to distinguish between competing hypothe¬ 
ses concerning the latent sources of B-W 
differences. 

Dolan Sc Hamaker , 2001, abstract 

Where does this somewhat technical 
argument leave the substantive conclusion? 
Rushton and Jensen say that Spearman’s 
hypothesis should be regarded as an empir¬ 
ical fact. The technical arguments and the 
various demonstrations by Dolan and his 
colleagues show that this is not the case. 
However, a demonstration of the weakness 
of the method should not be used to take 
another (indefensible) position - that group 
differences are independent of g. The results 
of the Dutch study that contrasted cultural 
and g differences as explanations for dif¬ 
ferences in test scores, although interest¬ 
ing, are in disagreement with several smaller 
studies from the United States. However, 
it is difficult to compare the degree of cul¬ 
tural difference between Native Dutch and 
immigrants, on the one hand, and between 
Whites and Blacks in the United States on 
the other. 

It is probably the case that most group 
differences in cognitive skills are due in part 
to differences in general reasoning, but the 
extent is not clear. It would be nice to have 
a reanalysis of some of the key studies using 
appropriate statistical methods. 

11.4.5. Group Differences on Educational 
Assessments 

In the United States African American 
students have markedly lower educational 
achievement than Whites; Asian-Americans 
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Panel 11.6. The James Watson Affair 

James Watson (1928-) is arguably Amer¬ 
ica’s most famous living scientist. In 1962 
he, along with his colleague Francis Crick, 
received the Nobel Prize for their dis¬ 
covery of the structure of the DNA 
molecule. Their work sparked the explo¬ 
sion of discoveries in molecular biology 
that have continued to this day. Subse¬ 
quently Watson left his appointment at 
Harvard to head the Cold Spring Harbor 
(New York) Laboratory, which he built 
into a major scientific institution. Late in 
his career Watson was a leading figure in 
the successful effort to catalog the human 
genome. 

In 2007 Watson, who has always been 
an outspoken person, wrote an autobi¬ 
ography, Avoid Boring People . He was 
scheduled to give several public talks in 
England. Just prior to the talks he gave 
an interview to the (London) Sunday 
Times.* The following quotes are taken 
verbatim from the article: 

1. He is inherently gloomy about the 
prospect of Africa. 

2. All our social policies are based on the 
fact that their intelligence is the same 
as ours, whereas all the testing says not 
really. 

3. His hope is that everyone is equal, but 
"people who have had to deal with 
black employees find that this is not 
true. ” 

4. There is no firm reason to anticipate 
that the intellectual capacities of peo¬ 
ples geographically separated in their 
evolution should prove to have evolved 
identically. 

5. Our wanting to reserve equal powers 
of reason as some universal heritage 
of humanity will not be enough to make 
it so. 

A huge public furor ensued. The New 
York Times published a lengthy discussion 


of subsequent debates between Watson’s 
supporters and critics. For some reason 
this appeared on the Arts page!' Many 
of Watson's speaking engagements were 
cancelled abruptly. He was almost imme¬ 
diately removed as director of the Cold 
Spring Harbor Laboratory. 

Let us look more closely at what Wat¬ 
son said. Statement (1) is a personal opin¬ 
ion that is shared by many, many knowl¬ 
edgeable observers. Statement (2) is 
probably a statement of fact. I know of 
no announced policy for various “Aid to 
Africa" programs that assumes that the 
intelligence of Africans is inherently and 
permanently limited. At the same time, 
many aid organizations have a realistic 
view of the state of the resources and 
teacher training in African schools. Intel¬ 
ligence test scores are generally lower 
in Africa, as is documented in the text 
of this chapter. Statement (3) is also 
a statement of fact. As references in 
this chapter have shown, in the United 
States the work performance evaluations 
of African Americans are, on the aver¬ 
age, lower than the evaluations received 
by white workers. This is true for both 
objective and subjective evaluations. The 
difference is much less than the differ¬ 
ence in test score averages. However, 
Watson's statement was certainly worded 
in a provocative manner. Statement (4) 
follows from well-established models of 
evolution and genetics. Populations sepa¬ 
rated in space, for many generations, and 
subject to somewhat different selection 
pressures will develop genetic distinc¬ 
tions, within the limits of genetic variabil¬ 
ity that permit viable phenotypes. This is 
as true for intelligence as for any other 
genetically influenced trait. Statement (3) 
is also a fact. Several Walt Disney movies 
notwithstanding, wishing for something 
does not make it so. 

In 2008 I was privileged to attend 
a three-day seminar on “improving the 
brain" that had been arranged by the Cold 
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Spring Harbor Laboratory in honor of 
Watson’s retirement. Watson discussed 
the incidents recounted here, with obvi¬ 
ous bitterness. He felt that his remarks 
had been taken out of context. 

The incident illustrates the extreme 
sensitivity of any discussion of racial, eth¬ 
nic, or national differences in intelligence, 
especially if it is linked to a discussion 
of genetics. Watson's remarks, as printed 
(and the London Times has stood strongly 
behind their accuracy), seem to have 
been rather like the famous Rorsach ink 
blot test in clinical psychology. The peo¬ 
ple who dropped him from their speaking 
engagements read a racist interpretation 


into the remarks. I would want to have 
heard the entire interview before 1 made 
such an interpretation. 

Given such strong sentiments, it is 
hardly surprising that policy makers, 
including many funders of scientific 
research, have backed away from inquir¬ 
ing about racial/ethnic differences in 
intelligence. This has inhibited inquiry 
and rational discussion, especially by 
those who hold an intermediate position 
on heredity - environment issues. 

* October 14, 2007. 

1 New York Times, December 1, 2007, p. A17 and 
following. 


have equivalent or slightly higher achieve¬ 
ment; and, depending somewhat on the par¬ 
ticular Hispanic group involved, Hispanic 
students’ achievement falls between those 
of Whites and African Americans. The best 
data relevant to this question comes from 
the NAEP testing program, for an attempt 
is made to make the NAEP samples repre¬ 
sentative of the population of the US K-12 
system, excluding those students who have 
significant mental or physical problems, and 
those students who are not proficient in 
English. 142 The effect is often referred to 
as the ' gap” in white-minority educational 
achievement. Figure 11.24 shows the size of 
the gap for seventeen-year-olds during the 
1990-2010 period, and, for comparison, the 
gap in the 1970s. The gap has been constant 
since roughly the 1990s, but is substantially 
smaller than it was in the 1970s. The narrow¬ 
ing of the gap is due to an increase in the per¬ 
formance of African American and Hispanic 
students. This probably reflects the benefits 
of the considerable national effort that has 

142 Grissmer, Flanagan, & Williamson, 1998. The exclu¬ 
sion is appropriate because the purpose of the 
NAEP is to track educational progress in typical 
American schools. NAEP results will understate 
the gap in the US population to the extent that 
African American and Hispanic students are over¬ 
represented in the excluded groups. 


been invested in improving minority educa¬ 
tion since the 1960s. 

The drop in the educational gap over 
time has been mirrored by a similar drop 
in the gap between African Americans and 
Whites on cognitive tests less tied to the edu¬ 
cational curriculum. Charles Murray ana¬ 
lyzed scores from the various standardiza¬ 
tions of the Woodcock-Johnson (W-J) test 
battery that took place in 1976-77, 1986- 
88, and 1996-1999. Recall, from Chapter 2, 
that this test is designed to provide sep¬ 
arate estimates of Gc and Gf. Unlike the 
educational data, the standardization sam¬ 
ple for the W-J tests included people over 
a wide range of ages, and therefore birth 
cohorts. Murray concluded that the gap 
in W-J scores dropped substantially in the 
cohorts born from i960 to 1970, and then sta¬ 
bilized at about .85 deviation units in subse¬ 
quent years. This trend mirrors the change 
in educational achievement scores, for the 
cohorts born from the late 1950s through 
the early 1970s would be the seventeen-year- 
olds tested during the period that the NAEP 
gap was being reduced, and the cohorts born 
from 1970 onward reached the age of seven¬ 
teen during the period that the NAEP gap 
was stable. 

The gap in educational achievement has 
considerable social importance, because of 
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Figure 11.24. Mean NAEP scale scores for White, African American 
and Hispanic seventeen-year-olds. Material downloaded from 
nces.ed.gov/nationsreportcard/lttdata, October 2009. 


the emphasis on education in the post¬ 
industrial workplace. Therefore, it is worth 
examining the gap itself and its changes in 
some detail. 

After conducting extensive analyses of 
a variety of national educational assess¬ 
ments, Hedges and Nowell concluded that 
the reduction in the gap was largely due 
to a reduction in the proportion of African 
American students in the lowest ranks of the 
test scores. They did not find an increase in 
the proportion of African Americans in the 
highest ranks. 143 Consistent with this posi¬ 
tion, the achievement gap in the NAEP is 
mirrored by a similar gap in SAT scores, 
which indicates group differences in cog¬ 
nitive skill in the self-selected, but socially 
important, subset of students who intend to 
obtain education beyond high school (Figure 
11.25). This is worrisome for those concerned 

143 Hedges & Nowell, 1998, pp. 159-161, and their 

Tables 5-3 and 5B-2. 


about group equality of rewards, for social 
and economic rewards are associated with 
higher levels of cognitive skills (see Chap¬ 
ter 10). 

Although the White-Black test score gap 
is present in test scores of very young chil¬ 
dren, it widens markedly over the school 
years. This can be used to argue that either 
(a) African American children are not well 
prepared for school or (b) the schools are 
selectively failing to educate African Amer¬ 
ican children. A good deal of light was 
shed on this issue in a tightly argued paper 
by Meredith Phillips, of the University of 
Chicago, and two of his colleagues, James 
Crouch and John Ralph. 144 

On the average, Black students enter ele¬ 
mentary school with fewer of the skills 
required to benefit from formal instruction 
than do White students. Phillips and col¬ 
leagues pointed out that it is important to 

144 Phillips, Crouse, & Ralph, 1998. 
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White African American Hispanic 

Figure 11.25. White, African American, and Hispanic mean scores 
on the SAT. The within-group standard deviation for this test is 
slightly over 100, making the White-African American gap 
approximately one d unit. 


distinguish between the possibility that 
Black students fall further and further 
behind as they progress through the schools 
from the possibility that all students, of 
whatever race or ethnicity, who are ini¬ 
tially poorly prepared fall further and fur¬ 
ther behind. 

Figure 11.26 presents the results of some 
of Philips and colleagues' analyses of the dif¬ 
ference between Black and White achieve¬ 
ment in reading and mathematics. The fig¬ 
ure shows the size of the gap as computed 
directly from test scores at the beginning 
of second grade and at the end of high 
school, either with or without corrections 


for prior measures of achievement; on enter¬ 
ing school (second graders, corrected for first 
grade scores]; and measures taken in the 
ninth grade and earlier (twelfth graders, cor¬ 
rected for scores through the ninth grade). 
The size of the achievement gap was sub¬ 
stantially reduced, although not quite elimi¬ 
nated, by considering the effects of the prior 
scores. Evidently differential reactions to 
schooling associated with race per se are only 
a small part of the causal factors that lead to 
a Black-White gap in student achievement. 

Of course, this does not reduce the seri¬ 
ousness of the situation, but it does cast it 
in a different light. The effect of schooling 


Math., no controls 12 
Math., no controls 2 
Math, i prior scores 12 
Math. | prior scores 2 
Reading, no controls 12 
Reading, no controls 2 
Reading | prior scores 12 
Reading | prior scores 2 



White mean - Black mean, d units 


Figure 11.26. Difference in deviation score units between Black and 
White students' scores on achievement tests taken in the second 
and twelfth grades. The effect is shown with and without controls 
for differences between students in reading and mathematical skills 
assessed earlier. Data from Phillips, Crouse, & Ralph, 1998, Tables 
7-4 and 7 - 5 - 



























420 


HUMAN INTELLIGENCE 


is certainly to increase cognitive skills. In 
the process of increasing those skills school¬ 
ing increases the range of variation between 
those who were or were not initially well 
prepared for schooling. 

11.4.6. Group Differences in Cognitive 
Skills in the Workplace 

Three sources of data are available to 
study group differences in workplace per¬ 
formance: the test scores obtained by people 
applying for jobs, supervisors’ ratings of on- 
the-job performance, and objective indices 
of performance. One can argue that the best 
measures of workplace performance would 
be the objective indices, that the supervi¬ 
sor’s ratings would be next best, and that the 
test scores would be only indirect evidence 
of performance. Unfortunately the availabil¬ 
ity of the data produces the opposite rank¬ 
ing; it is fairly easy to study test scores and 
relatively expensive to obtain objective per¬ 
formance measures. 

A great many of the studies of applicant 
test performance use data from the Wonder- 
lie Personnel Test [WPT) and its current ver¬ 
sion, the WPT-R. As a brief reminder, peo¬ 
ple who take the WPT have to switch very 
rapidly from doing one simple task [sim¬ 
ple mathematics word problems) to another 
[analyzing logical statements). The result is 
a sort of mental gymnastics for the working 
memory complex, as tasks and algorithms 
have to be switched rapidly. Not surpris¬ 
ingly, the WPT is a reasonable indicator 
of general intelligence. As it takes only 
twelve minutes, it is a cost-effective screen¬ 
ing instrument. 

Philip Roth, a professor at Clemson Uni¬ 
versity, and his colleagues have conducted a 
meta-analysis of the WPT and some other 
data on applicant test scores. 145 When they 
combined the results from all jobs they 
found a White-Black d value of .99, and a 
White-Hispanic d of .83. Within jobs their 
estimate of the White-Black difference was 
.83 for jobs of low complexity, .72 for jobs of 
medium complexity, and .63 for jobs of high 

145 Roth et al. ; 2001. 


complexity. The reduced d values could be 
expected on the basis of self-selection. It was 
not possible to analyze the White-Hispanic 
difference by job complexity, due to lack of 
data. 

Meta-analyses have also been conducted 
on performance ratings. 146 The advantage in 
job performance, d^ . 3 in favor of Whites, is 
substantially less than the White advantage 
on test performance, d^ 1. Black-White dif¬ 
ferences in job performance are larger when 
performance is evaluated by objective cri¬ 
teria, such as work samples and tests of job 
knowledge, than when supervisor ratings are 
used. The d values for such measures are in 
the .4-. 5 range. This compares to d values 
for ratings of interpersonal skills, where the 
differences fall closer to .2. 147 

There is no contradiction between the 
findings on test performance and job perfor¬ 
mance. Within the population of applicants 
for a particular job differences in test scores 
are in the d = .y-.8 range. Predictive valid¬ 
ity correlations range from .3 for jobs of low 
cognitive complexity, to .6 for jobs of high 
complexity. Multiplying .75 by either .6 or 
.3 provides a predicted d of either .45 or .22, 
depending upon the cognitive loading of the 
job and the performance rating. 

Another way to estimate workplace suc¬ 
cess is to look at income. Income is impor¬ 
tant in itself, but it is only loosely tied to per¬ 
formance within an occupation, for many 
other variables, such as the type of position, 
noneconomic rewards, seniority, and area of 
the country also influence income. 

White incomes are generally higher than 
the incomes of African Americans and Lati¬ 
nos. Based on their study of the NLSY79 
data, Herrnstein and Murray claimed that 
the disparity between Black and White 
incomes virtually disappears after statisti¬ 
cally controlling for intelligence, as indexed 
by AFQT scores. 148 However, they did 
not report separate analyses for men and 
women. A more detailed analysis of the 

146 McKay & McDaniel, 2006; Roth, Huffcut, & Bobko, 

2003. 

147 McKay & McDaniel, 2006. 

148 Herrnstein & Murray, 1994, pp. 324-325. 
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NLSY79 data found that the relation 
between income and AFQT scores was 
essentially identical across the three racial 
groups - Whites, Blacks, and Hispanics - 
after making separate analyses for men and 
women. The same analyses indicate that the 
variation in income associated with AFQT 
scores was much smaller than Herrnstein 
and Murray had claimed that it was. 149 

In summary, it appears that there are gaps 
in job performance between Whites, African 
Americans, and Latinos (to the extent data 
is available), and that these gaps are about 
what would be predicted from the gaps 
in cognitive test scores and the correlation 
between test scores and performance. 

11.4.7. The Case of Asian-Americans 

Asians, a minority in America, but a plu¬ 
rality in the world, present a very differ¬ 
ent picture. Asian-Americans, as a group, 
have slightly higher cognitive test scores 
than do Whites. This is true both today 
and at a much earlier time, when Asian- 
Americans tended to have lower SES than 
they do today. They also have a pronounced 
pattern to their differences, on both intel¬ 
ligence tests and markers of educational 
achievement. 

The fact that Asian-Americans tend to 
have high scores on intelligence tests is not 
a new observation. In the 1920-30 period 
Stanley Porteus (1883-1972), a professor of 
Psychology at the University of Hawaii, 
collected data showing that Japanese- 
Americans had higher intelligence test 
scores than did other groups. 150 At that time 
Japanese-Americans in Hawaii were consid¬ 
erably lower on the SES scale than they are 
today, so the relatively high test scores rep¬ 
resented a reversal of the usually observed 
positive relation between IQ scores 
and SES. 

The statement that Asian-Americans 
have higher average scores masks an impor¬ 
tant difference in the pattern of scores. 

149 Cawley et al, 1997, especially Table 8.5, and Cav- 

allo, el-Abadi, & Heeb, 1998, Table 8.6. See also Fig¬ 
ure 10.7 in this volume and the accompanying text. 

150 Porteus, Dewey, & Bernreuter, 1930. 


Asian-Americans tend to do better than 
Whites in Mathematics and visual-spatial 
reasoning, and tend to be somewhat below 
Whites in verbal scores. Figure 11.27 illus¬ 
trates a typical finding, scores on the SAT for 
the period 1997-2007. Black scores are shown 
for comparison. The White-Asian difference 
is clear, and is also clearly much smaller than 
the Black-White difference. 151 

The US results are mirrored on the inter¬ 
national scene. Of the forty-eight coun¬ 
tries participating in the International Math¬ 
ematics and Science Studies (TIMSS) in 
2007, the top five countries were, in order 
of eighth-grade mathematics scores; China 
(Taipei), South Korea, Singapore, China 
(Hong Kong), and Japan. All five countries, 
and only those five countries, had test scores 
that were reliably higher than the scores for 
the United States. The PISA assessments 
show a similar picture. In the 2007 PISA 
mathematics assessments three out of the 
top five countries - Japan, South Korea, and 
Hong-Kong (People's Republic of China) - 
were from northeast Asia. While northeast 
Asian countries generally did well in the 
reading assessments, they were not nearly 
as dominant as they were in mathematics. 
Only Korea retained a position in the top 
five. 152 

These remarks refer strictly to northeast 
Asians - people who emigrated from, or 
whose forefathers emigrated from, China 
(including Taiwan), Japan, or Korea. The 
recent emigrations from Southeast Asian 

151 People who take the SAT are a self-selected pop¬ 
ulation, so this comparison might be contaminated 
by recruitment effects, as discussed in section 11.1 
of this chapter. In this case, however, recruitment 
probably works to underestimate the differences 
in the population, because the probability that an 
Asian-American or Black student will attempt to 
take the SAT is higher than the probability that a 
White student with equivalent high school exam¬ 
ination scores will take the SAT. (Calculations by 
Tara Madhyastha and the author, based upon com¬ 
parisons of the frequencies of various ethnic groups 
at different NAEP scoring levels and in the self- 
selected group taking the SAT.) 

152 Hong Kong was tenth and Japan was tied with 
the tiny Duchy of Lichtenstein for the thir¬ 
teenth position. Source: National Center for Educa¬ 
tional Statistics, Digest of Educational Statistics 2007, 
Table 389. 
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Figure 11.27. SAT scores for three racial/ethnic groups for the 
period 1997-2007. The top panel shows scores on the SAT-M (M 
code], the bottom shows the SAT-R (R code] scores. Source: 
College Board. 


nations, such as Thailand, India, Indonesia, 
and the Philippines, are not reflected in most 
of the statistical comparisons. In fact, the 
current US census and educational practice 
of lumping all the Asian countries together, 
and further adding in emigrants from the 
Pacific Islands, is bound to confuse the statis¬ 
tical summaries. This is going to be a serious 
problem in the future, for there are studies 
indicating that educational performance and 
relevant social conditions vary considerably 
across different Asian groups. 

In summary, there is little doubt that IQ 
scores and educational data present a con¬ 
sistent ordering of the major racial/ethnic 
groups in the United States, with (north¬ 
east] Asians slightly ahead of Whites, His- 
panics about .5 to .7 deviation units below 
Whites, and African Americans from .8 
to 1.1 deviation units below Whites. The 


picture has changed over time, but slowly. 
Two questions remain to be addressed; does 
this ordering matter, and if it does, what are 
its causes? 

11.4.8. Are the Tests Valid for Various 
Racial and Ethnic Groups within the 
Developed Societies? 

There have been numerous claims that cog¬ 
nitive tests do not fairly assess minorities. 
In one of the most famous of these dis¬ 
putes, Larry P. vs. Ryles , 153 a federal court 
forbade the use of intelligence tests to clas¬ 
sify students as “educable mental retardates” 
and then channel them into special classes. 
(Ironically, eighty years earlier the French 

153 Larry P. v. Riles, U.S. Court of Appeals, 1984, 
793F2CI969 (9th Circuit]. 
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Ministry of Education had commissioned 
Binet to develop tests for this purpose.) 
The courts’ argument was that the tests 
were illegally discriminatory because when 
they were used a disproportionate percent¬ 
age of African American children received 
IQ scores below 70, one of the criteria for 
retardation. 

The reasoning used to win the Larry P. 
case is typical of the arguments made against 
relying on the various cognitive tests as valid 
indicators of the mental abilities of minor¬ 
ity group members. Under what conditions 
might the argument be valid? 

Return to the distinctions among three 
sets of cognitive skills; assessed relevant skills , 
those used both to take the test and to per¬ 
form in school or the workforce; assessed 
irrelevant skills that can be used to take the 
test but are irrelevant to school and work¬ 
place performance; and unassessed relevant 
skills, cognitive (or other) skills that are rele¬ 
vant in the school and workplace, but are not 
evaluated by the tests. If only assessed rele¬ 
vant skills existed, the correlation between 
test scores and criterion performance in the 
school or workplace would approach 1.0. 
This, of course, is an unrealistic criterion, 
for no one thinks either that the tests are 
perfect or that success is determined solely 
by cognitive characteristics. An argument 
against testing different racial/ethnic groups 
cannot be made on the basis of imperfect 
testing. 

The argument that the tests are not fair 
(to African Americans, Hispanics, or any 
other group) cannot rest on differential dis¬ 
tributions of assessed relevant skills in the 
White and minority communities, because 
any prediction based only on these skills is 
a fair prediction. The argument has to be 
made on the basis of assessed irrelevant skills 
or unassessed relevant skills. Two conditions 
are possible: 

a. the members of the affected minority 
group are less likely to possess assessed 
irrelevant skills than Whites are, or 

b. the affected minority group members 
are more likely to possess unassessed rel¬ 
evant skills than Whites are. 


If either of these conditions hold the tests are 
unfair, either because whites receive a boost 
due to their narrow test-taking skills, condi¬ 
tion (a), or because minority group members 
do not receive appropriate credit for their 
unassessed relevant skills, condition (b). 

Suppose condition (a) exists. Minority 
group members will have spuriously lower 
test scores than Whites because the White 
scores are elevated by their assessed irrel¬ 
evant skills. The correlation between test 
scores and criterion performance, calcu¬ 
lated only for white examinees, should be 
lower than the same correlation calculated 
within the minority group, because the 
White scores are affected by factors irrel¬ 
evant to school and workplace (criterion) 
performance, while the minority scores are 
not. If a common prediction equation is 
developed for all examinees, regardless of 
race or ethnic status, the equation should 
overpredict White performance because the 
White scores are systematically and erro¬ 
neously inflated. Minority performance will 
be underpredicted. 

Suppose condition (b) exists: minori¬ 
ties exceed Whites in the possession of 
unassessed relevant skills. The correlation 
between test scores and criterion perfor¬ 
mance will be lower in the minority group 
than in Whites because the minority cri¬ 
terion performance scores have a source 
of variance that the White scores do not. 
If a common prediction equation is used, 
minority group performance will again be 
underpredicted, due to the minority group 
members’ having unassessed but relevant 
cognitive skills. 

In either case of unfairness, minority 
group scores should be underpredicted by an 
equation based on scores from all groups - 
the common prediction equation. Depend¬ 
ing upon which case of unfairness occurs, 
the correlation between test and criterion 
performance may be either higher or lower 
when calculated for minority group mem¬ 
bers alone than for Whites alone. 

This definition of fairness is restricted in 
two ways. The prediction equation must act 
only on test scores. If the prediction equa¬ 
tion includes an adjustment for racial/ethnic 
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status, by adding or subtracting a con¬ 
stant, predictions, but not correlations, will 
be adjusted appropriately. Whether or not 
such an adjustment should be made is a 
legal/ethical question rather than a scientific 
one, and will not be further discussed. 

Psyohometric fairness has been defined 
entirely on the basis of statistical accuracy 
and skill possession. It is irrelevant to larger 
issues of social justice. It could be that when 
test scores are used to predict performance 
the appropriate equation systematically pre¬ 
dicts lower criterion scores for minority 
group members than for Whites, and that 
in fact they have lower criterion scores, 
because they have not had the same oppor¬ 
tunities Whites have had to acquire the 
necessary assessed, criterion-relevant skills. 
Such a situation is indeed unfair, in a global 
sense, and some form of affirmative action 
may be appropriate. Blaming an IQ or other 
cognitive test for this situation amounts to 
shooting the messenger. A test can only 
assess skill possessed at the time of testing; 
it cannot tell us how those skills came to be 
possessed. 

How do the facts stand up to the criteria 
of fairness? 

There is scant evidence that educational 
assessment tests are unfair, in the sense 
just defined. The validation data for the 
SAT provide a clear case. The correlation 
between SAT scores and first-year grade 
point average, the usual criterion for such 
studies, is .53 for Whites, .50 for Hispan- 
ics, and .47 for Blacks. While these small 
differences are statistically significant (with 
sample sizes of 10,000 or more per group), 
it is hard to imagine a situation where 
they would be of practical importance. 
What is even more telling is that African 
American and Hispanic grades in college 
are systematically overpredicted. This is 
not consistent with either condition of 
unfairness. 154 

Differences between White and minor¬ 
ity group academic performance (White- 
minority group) are on the order of d = .40 

154 Mattern et al., 2008, Table 2. Correlations have been 

corrected for restriction in range. 


for African Americans and .30 for Hispanics. 
These figures are substantially smaller than 
the corresponding differences in test perfor¬ 
mance, d = 1.0 and d = .6 -.7. 155 This is 
what one would expect, because the tests 
are imperfect predictors. Assuming a .5 pre¬ 
dictive validity for educational tests such as 
the SAT, the expected d values for academic 
performance are .50 and .30-35, not far from 
the observed differences. 

I have found relatively few industrial 
studies that have reported a contrast 
between the predictive accuracy of tests 
for Whites and minority group members. 
A 2008 discussion of this issue by a knowl¬ 
edgeable researcher in the field 156 cited a 
study almost twenty years earlier as evidence 
that there is no difference in the predictive 
accuracy of industrial tests across racial and 
ethnic groups. The study in question 157 was 
a National Academy of Science review of 
yet earlier studies, involving a single test, 
the Department of Labor’s General Apti¬ 
tude Test Battery (GATB). Its authors con¬ 
cluded that test scores were equally predic¬ 
tive for both groups. 158 In my own searches 
I found one (1) recent study, of white- 
collar/managerial performance in a govern¬ 
ment agency. It concluded that various 
predictors of job success, including but not 
limited to cognitive tests, behaved in the 
same way for Black and White employees. 159 

In sum, the tests are accurate predic¬ 
tors of achievement for the three major 
racial/ethnic groups in the United States - 
Whites, African Americans, and Hispanics. 
The data for Asian-Americans is too scanty 
to make a firm statement, although there 
is some evidence that academic scores are 
underpredicted. Since test scores are higher 

155 Sackett, Bomeman, & Connely, 2008. 

156 Ibid. 

157 Hartigan 8t Wigdor, 1989. 

158 Many investigators must have had access to data 
relevant to this issue. For example, military studies 
have been reported in which it would have been 
impossible not to have obtained data that could be 
used to compare predictions across groups. Consid¬ 
ering the amount of controversy over the question 
of test bias, it is surprising that the comparisons 
were not made. 

159 Pulakos, Schmitt, & Chan, 1996. 
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for Asian-Americans than for Whites, this 
suggests that unfairness condition fb) holds: 
Asian-Americans may have more unassessed 
relevant cognitive skills than do Whites. 

The differences in test scores across 
racial/ethnic groups almost certainly reflect 
a real difference in the distribution of cog¬ 
nitive skills across racial/ethnic lines. What 
might cause the difference? 

11.4.9. S oc i a l Causes for Disparities in 
the Distribution of Intelligence across 
Racial/Ethnic Lines 

Neither I nor anyone else knows the cause 
of the differences in indices of intelligence 
among various racial and ethnic groups. Fur¬ 
thermore, there almost certainly is not any 
single cause, and the causes may vary for dif¬ 
ferent comparisons. We do have some leads. 

Not surprisingly, the explanations fall 
into two broad categories - social and biolog¬ 
ical. The social ones will be discussed here, 
the biological ones in the next section. 

EFFECTS THAT MAY ALTER THE 
INTERPRETATION OF TEST SCORES 
IN AFFECTED GROUPS 
There have been a number of claims that the 
tests used for personnel screening are unfair 
because they contain questions that draw 
on cultural information not available, or less 
available, to Blacks, Hispanics, and Asians. 
While it is not generally acknowledged, this 
claim has to be made with the admittance 
that the relevant cultural information has 
predictive value for performance, for other¬ 
wise it would amount to unfairness condi¬ 
tion [a], as defined in the previous section, 
and we have seen that condition (a} cannot 
be maintained. The argument has to be the 
"social justice” argument that the minority 
group members are less able to prepare for 
the tests than Whites are. 

Arthur Jensen is one of the strongest 
critics of this claim. 160 He has applied the 
method of correlated vectors, with g load¬ 
ings replaced by expert judgment of the 

160 This is discussed extensively in Jensen’s 1980 book 
on the topic. Further discussions of more recent data 
can be found in Jensen, 1998, 359-367. 
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extent to which questions or tests are cul¬ 
turally loaded, and concluded that the cul¬ 
tural loadings do not predict the extent 
of group differences on tests, subtests, and 
individual questions, while the g loadings 
do. Most of the studies Jensen considered 
compared Whites to African Americans in 
the United States. A substantial European 
study by Michelle Helms-Lorenz and her 
colleagues at the University of Tilburg, in 
the Netherlands/ 61 which also used the 
method of correlated vectors, found that 
cultural loadings rather than g loadings were 
the best predictors of differences between 
racial/ethnic groups of Dutch children. 
The Dutch and the American studies do 
not necessarily conflict, for different ethnic 
groups were involved. In particular, many of 
the African immigrants to the Netherlands 
are relatively recent arrivals, while African 
Americans in the United States, with a few 
exceptions, have a history that goes back 
centuries. 

In addition to conducting analyses of 
racial/ethnic differences on battery type 
tests, Jensen has pointed out that White- 
Black differences are found on tests that 
do not have obviously large cultural biases. 
For example, progressive matrix tests do not 
have obvious culture loadings, yet they pro¬ 
duce fairly large racial/ethnic differences. 

The matter is somewhat up in the air. 
However, the indeterminancy tells us some¬ 
thing. If cultural unfairness were a major 
cause of racial/ethnic differences in test per¬ 
formance, we would not have as much trou¬ 
ble detecting it as seems to be the case. This 
does not mean that culture plays no part in 
determining test scores, particularly if the 
cultural differences are marked. For exam¬ 
ple, I would be suspicious of the use of the 
tests to evaluate intelligence in recent immi¬ 
grants to a developed nation, although test 
scores might be useful as an index of assim¬ 
ilation. Whether it was desirable to use a 
valid, but in one sense “culturally unfair,” 
test to screen applicants for jobs or academic 
programs would be a policy issue, not a sci¬ 
entific question. 

161 Helms-Lorenz, van de Vijver, & Poortinga, 2003. 
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This point was illustrated in an analysis 
of several studies of the workplace perfor¬ 
mance of various immigrant groups in the 
Netherlands. 162 Immigrants did indeed have 
lower test scores than ethnic Dutch. They 
also had lower scores in academic achieve¬ 
ment and on a variety of workplace per¬ 
formance ratings. (Note the similarity to 
the American studies discussed earlier.) The 
authors then contrasted test scores and aca¬ 
demic and workplace performance of suc¬ 
cessive generations of immigrants. The dif¬ 
ference between the Dutch and non-Dutch 
groups was reduced in the second genera¬ 
tion. As the immigrant-derived groups had 
more experience with the Dutch society 
they acquired the cognitive skills it valued. 
The test scores responded to the presence of 
these skills, and so did academic and work¬ 
place performance. There was a socially 
induced change in intelligence. 

This brings us to a slightly different view 
of cultural bias on a measure of intelli¬ 
gence. If one is trying to understand or pre¬ 
dict success in a certain society - usually a 
post-industrial society - then one’s concern 
should be with the measurement of intelli¬ 
gence as defined by that society. Therefore, 
a certain amount of cultural bias in a test is 
appropriate. How much depends upon the 
situation. Troubles occur only if an inves¬ 
tigator tries to define intelligence as some¬ 
thing that is invariant across cultures. 

A second argument for the idea that test 
scores are biased against minorities is that 
the testing situation itself is biased. Some 
years ago it was claimed that minorities, 
and especially minority children, would not 
communicate adequately with White exam¬ 
iners. Very little evidence has surfaced to 
support this idea, and in any case it would 
not apply strongly to a group testing situ¬ 
ation. A more serious concern has to do 
with motivation. The idea is illustrated by 
the stereotype threat argument - that if 
minorities (or women) perceive a test as 
the sort of test on which they should do 
poorly, they will not put as much effort into 


162 Te Nijenhuis et al., 2004. 


solving test questions. 163 The argument here 
is exactly the same as the argument pre¬ 
sented to explain the fact that women tend 
to have lower scores than men on mathe¬ 
matics and visual-spatial tests. And the evi¬ 
dence is the same. Stereotype threat can 
be demonstrated in laboratory settings/ 64 
but not in field situations where success 
or failure has real consequences for the 
examinees. 165 The use of the MGCFA tech¬ 
nique to evaluate stereotype threat effects in 
high-stakes testing has been suggested, but 
at present no such studies are available. 166 

A third argument is that minority group 
members may lack culturally relevant cogni¬ 
tive skills that are evaluated by conventional 
tests, but that they possess other important 
cognitive skills that are not evaluated by the 
tests. This is condition (b) in my categoriza¬ 
tion of unf air testing. It implies that the tests 
should underpredict minority performance. 
Such underprediction has not been reported 
for African Americans or Hispanics, either 
in academia or the workplace. There is a 
suggestion that condition (b) may apply to 
some Asian-Americans. 

The claim that test scores are irrelevant or 
inaccurate in predicting minority group per¬ 
formance is contradicted by the evidence. 

SOCIAL EXPLANATIONS: DEVELOPMENT 
Some explanations for racial and ethnic dif¬ 
ferences in intelligence accept the validity of 
test scores as indicators of actual differences, 
and then seek to explain these differences 
in terms of social causes acting on cognitive 
development. In order to understand these 
explanations it helps to step back and con¬ 
sider how we develop intelligence. 

Intelligence refers to the possession of a 
set of cognitive skills and knowledge that are 
useful in post-industrial society. These skills 
are not inherited directly, in the sense that 
eye color is inherited. They are produced by 
an interaction between a person’s learning 

163 Steele & Aaronson, 1995, 1998; Suzuki & Aaronson, 

2005. 

164 Nguyen & Ryan, 2008. 

165 Cullen, Hardison, & Sackett, 2004; Sackett, Hardi¬ 
son, & Cullen, 2004; Strickler & Ward, 2004. 

i66Wicherts, Dolan, & Hessen, 2005. 
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capacity, which is initially genetically deter¬ 
mined but soon influenced by what the per¬ 
son already knows, and social supports that 
facilitate the learning process. Initially these 
social supports are largely within-family, but 
as a person grows older the support net¬ 
work ranges from kindergarten to the uni¬ 
versity, church, and workplace, and from 
exploration of the playground to exploration 
of the World Wide Web. Cognitive skills are 
required to utilize these resources, so there is 
an important positive feedback loop. What 
you know already is a powerful predictor of 
what you are about to learn. 

Human learning is an active process. It 
depends upon both the potential offered by 
the environment and the student’s motiva¬ 
tion and skill in interacting with the environ¬ 
ment in a way that furthers the acquisition 
of learning. These statements apply equally 
to learning calculus, learning how to cook a 
souffle, and learning how to hit a backhand 
overhead shot in tennis. 

The next step is to apply these statements 
to racial/ethnic differences in intelligence 
and academic achievement. 

Parental socioeconomic status (SES) is 
commonly pointed to as an indicator of 
social support [or lack thereof). Parental 
SES is confounded with racial/ethnic sta¬ 
tus. This is particularly true for recently 
arrived immigrants, who also may not be 
familiar with the dominant language. Chap¬ 
ter 9 reviewed the substantial evidence, from 
cultures as diverse as the rural Philippines 
and the American Midwest, showing that 
children from families that are disrupted 
and/or socioeconomically stressed are read 
to less, are less encouraged to do problem 
solving on their own, and generally tend to 
have lower intelligence test scores, even in 
the pre-school years. Home environments 
count. 

Because home environment is strongly 
correlated with SES, one could argue that 
to a great extent the observed racial/ethnic 
disparities in cognitive skills are actu¬ 
ally SES effects. This proposal has to be 
taken seriously, for the correlation between 
race/ethnicity and SES is substantial. In 2006 
the percentages of families with children 


under eighteen who were below the poverty 
line was as follows: 

White . . . 9.5% 

Black ... 33.0% 

Hispanic . . . 26.6% 

Asian . . . 12.0% 

A lack of financial resources substantially 
constrains the amount of time and effort that 
adults can give to parenting. This undoubt¬ 
edly takes its toll on the children. Recall that 
the gaps in cognitive achievement toward 
the end of high school can be predicted, on 
an individual basis, by the cognitive gap at 
the start of elementary school. The original 
gap is closely related to SES. 

Nevertheless, there are three reasons for 
not accepting SES as the sole explanation 
for racial/ethnic disparities in intelligence. 
The first is statistical: SES measurements do 
not fully account for the differences in intel¬ 
ligence measures between groups. An SES 
argument also fails to account for the fact 
that the differences between groups on cog¬ 
nitive measures tend to be large for “cultur¬ 
ally reduced” tests, such as the RPM tests. 

Second, SES is itself confounded with 
intelligence. We then get a chicken- and-egg 
phenomenon. For children, parental SES is 
confounded with parental intelligence, and 
since intelligence is heritable to a significant 
degree, it is not clear whether an effect of 
parental SES on children's intelligence is due 
to social or biological inheritance. 

Finally, and to me most compelling, 
socioeconomic status is a statistical abstrac¬ 
tion. As such, it cannot cause anything. 
Intellectual development depends upon 
physical and social variables within the envi¬ 
ronment that are correlated with SES. 

What might some of these variables be? 
One class of variables has already been 
implicated: parental practices. It has been 
estimated that in order to participate fully 
in school a child has to have learned to 
read approximately 9,000 words by the third 
grade. The child also has to have a good 
knowledge of the syntactical and semantic 
rules of the language that will be used in the 
school. A substantial portion of this knowl¬ 
edge is acquired via parental interaction, 
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such as reading to children/ 67 While there 
is less data for other skills, such as encourag¬ 
ing problem solving (at least that I am aware 
of) ; the same thing is likely to be true of ele¬ 
mentary mathematics (e.g., counting] and 
general problem-solving skills. 

Parenting practices are required to 
draw out biological potentials in young 
children. l6R The extent to which these are 
developed at the start of school is highly 
predictive of later development in school. 
As a general statement, for there obviously 
are exceptions, Black, White and Latino par¬ 
ents with low SES seem to be low on the 
time-consuming parental skills required to 
guide children toward independent problem 
solving. Asian parents with similar low SES 
give the practice of these skills high prior¬ 
ity. This seems to be particularly true of 
Asian-Americans from the northeast Asian 
nations - China, Korea, and Japan/ 69 

If this argument is correct, it ought 
to be possible to account for the gap in 
Black-White performance by considering 
the effect of these variables. This question 
was addressed in an analysis of data from the 
Panel Study on Income Dynamics (PSID], a 
longitudinal study being conducted by the 
Institute for Social Research of the Univer¬ 
sity of Michigan. In 1997 extensive data was 
collected on the families involved who had 
dependent children age twelve and under. 
The children's cognitive skills were mea¬ 
sured using age-appropriate tests, including 
several of the tests in the Woodcock- 
Johnson battery. Parental income and occu¬ 
pation and household wealth were mea¬ 
sured. Mother’s verbal skills were measured, 
using the paragraph comprehension subtest 
of the Woodcock-Johnson battery. This is a 
good marker for Gc. In addition, a number 
of measurements were made on the home 
environment itself, including parenting style 
and the quality of adult interactions with 
children. 

The typical Black-White gap appeared 
when the children’s cognitive scores were 

167 Wolf, 2007, Chapter 4. 

168 Collins et al., 2000. 

169 Corwyn & Bradley, 2008; Steinberg, 1996. 


considered alone. However, it was greatly 
reduced, and no longer statistically reliable, 
after the family and home characteristics 
were considered. The three best predictors 
were occupational status of the head of 
the household (the higher prestige of the 
occupation, the better the child’s scores), 
mother’s score on the verbal comprehen¬ 
sion test (the higher the mother’s score, the 
higher the child’s score), and the measures 
of the nature of the interactions with the 
child/ 70 

The variables that best predict a child’s 
scores are variables correlated with racial 
status. This is shown in Figure 11.28, which 
presents the d values associated with the 
four measures that were found most pre¬ 
dictive of children’s cognitive test scores. 
Black and White families differ on the key 
variables. This is an illustration of the gen¬ 
eral point that socioeconomic characteris¬ 
tics, rather than racial identity per se, are 
determiners of the Black-White gap in fairly 
young children. 

What about group differences in older 
children and adults? The influence of fam¬ 
ily environments is diminished as children 
age. Therefore, we want to look at the 
interactions between students in different 
racial/ethnic groups and their schools, for 
the development of intelligence during this 
period depends on three things: biological 
potential (which can never be gainsaid), the 
quality of instruction, and the way in which 
the student interacts with the instruction. 

Laurence Steinberg, a professor at Tem¬ 
ple University, and his colleagues have con¬ 
ducted extensive studies of the attitudes 
of adolescents toward school/ 71 Steinberg’s 
group reached the following conclusions: 

1. The dominant attitudes of the White 
peer group discourage high levels 
of school achievement. Conspicuous 
efforts toward studying are frowned 
upon. However, poor academic perfor¬ 
mance is also looked down upon. In 
general, it is OK to do well in school, 

170 Yeung & Conley, 2008. 

171 Steinberg, 1996. 
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Figure 11.28. The effect sizes for White-Black contrasts on selected 
variables shown to predict the Black-White contrast on tests of 
cognitive skills. All effect sizes are reliable at the .05 level or lower. 
Calculations based on Yeung & Conley, 2008, Table 3. The 
selection of variables was based on regression analyses presented in 
that paper. 


provided that you do not appear to be 
working hard at it. 

2. Black and Hispanic students claim 
that high achievement would be valu¬ 
able, but do not seem to be threat¬ 
ened by poor achievement. Consider¬ 
able emphasis is placed upon behaviors 
that display group solidarity; “hanging 
out” is seen as important, studying is 
not. 

3. Asian-American students believe in the 
importance of high achievement, and 
expect that it will be achieved only by 
sustained effort. This group was the only 
group that reported substantial discus¬ 
sion of class work between peers. The 
effect is highest in children who are 
immigrants or whose parents are immi¬ 
grants from the three northeast Asian 
countries. 

John Ogbu, an anthropologist at the 
University of California, Berkeley, reached 
much the same conclusion on the basis of 
ethnographic analyses of the attitudes of 
Black adolescents from well-to-do homes in 
a similar city. Support for Ogbu’s conclusion 
was obtained in a survey of student feel¬ 
ings about high academic achievement. In 
general, Black tenth-graders saw social costs 
to academic achievement, especially if they 


were in a predominantly Black school. Both 
Black and White teenagers believed that 
being in an honor society would threaten 
their chances of being part of the “in” crowd 
or being popular with the opposite sex, 
but these beliefs were much more common 
among Black students. 172 

This line of research has generated a great 
deal of debate. Several social commenta¬ 
tors and activists, including African Ameri¬ 
can activists, have accepted the findings and 
made efforts to change adolescent attitudes 
in the African American communities. Oth¬ 
ers have argued that the phenomenon is a 
result of socioeconomic status, rather than 
racial/ethnic attitudes. As a statistical phe¬ 
nomenon, though, this debate does not mat¬ 
ter! If Blacks and Latinos are concentrated 
in lower SES groups, then any gap in per¬ 
formance that is based on the performance 
of students in general could appear either as 
an SES or a racial/ethnic disparity. 

Finally, there is the question of the 
schools themselves. In extremely segregated 
societies, such as the American South prior 
to [forced) school integration in the 1950s or 
South Africa prior to the end of apartheid, 
schooling was separate and grossly unequal 

172 Cook & Ludwig, 1998. Perhaps a way to improve 
American education is to keep the names of stu¬ 
dents on high school honor rolls a secret. 
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for Whites compared to non-Whites. That 
is no longer the situation today, in either 
country, but there is a correlation between 
racial/ethnic status, residential location, and 
the quality of schooling. For instance, in the 
United States schools in heavily Black, low 
SES areas tend not to be desirable positions 
for many teachers. Also, to the extent that 
school financing is dependent upon the local 
tax base, schools located in economically 
distressed areas are at a disadvantage. This 
is especially serious for attempts to control 
student/teacher ratios. At the gross level of 
measuring school quality by the amount of 
financing available per student, school fund¬ 
ing seems to have little influence on out¬ 
comes. A finer-grained analysis provides a 
different picture. 

As was shown earlier, the Black-White 
gap in cognitive skills upon finishing school¬ 
ing is predictable, on an individual basis, 
from the gap upon entry to kindergarten. 
The size of the gap between groups, mea¬ 
sured in deviation units, is substantially 
larger at the end of high school than at the 
beginning of the first grade. This suggests 
that the mathematical relationship is 

df = cdg + e, (11.3) 

where d g is the original black-white gap, df 
is the final gap, c is a constant greater than 1, 
and e is a residual term. Furthermore, the 
equation should hold for any educational 
transition. 

Equation 11.3 implies that the Black- 
White gap grows steadily throughout the 
school years, which it does. This could be 
taken as evidence for social practices within 
the schools that place Black children at 
a disadvantage. However, a careful analy¬ 
sis of the patterns of accomplishment in 
both Black and White schoolchildren sug¬ 
gests a different conclusion. 17 ^ Equation 11.3 
applies not only to Black-White contrasts, 
but also to contrasts between White chil¬ 
dren who enter school in an advantaged or 
disadvantaged position. Less abstractly, the 

173 Phillips et al, 1998. 


gap in performance between students who 
were initially well or poorly prepared, prior 
to entering kindergarten or the first grade, 
increases steadily throughout the school 
years. Because Black children are more likely 
than White children to enter at a disad¬ 
vantage, the progressively increasing gap 
between well-prepared and poorly prepared 
students, regardless of race, can appear as 
an increasing gap between Black and White 
students. 

In sum, there are many social prac¬ 
tices that have an impact on racial and 
ethnic differences in cognitive skills. They 
include parenting practices and cultural 
beliefs about the educational process. These 
in turn interact with issues of school financ¬ 
ing and teacher behavior. It is extremely 
hard to quantify these various social vari¬ 
ables. Furthermore, they are highly interac¬ 
tive and collinear, making an assignment of 
weights or evaluation of causal models diffi¬ 
cult. I do not doubt, though, that the social 
environment of racial and ethnic groups has 
a major influence on the development of 
intelligence within those groups. 

We might then ask why these differences 
in social environments arise in the first place. 
Here we enter the realm of speculation. 

William Julius Wilson, a sociologist at 
the University of Chicago, has argued that 
since the end of segregation and legal 
discrimination the Black community has 
become increasingly bifurcated. 174 Middle- 
class Black families have seized expanded 
opportunities for economic, residential, and 
social improvement. To some extent this has 
led them to disengagement from the “ghet¬ 
tos,” heavily Black areas in the inner cities. 
At the same time changes in the nature of 
work have led to a reduction in the employ¬ 
ment opportunities available to less edu¬ 
cated, less well-off people, regardless of their 
race or ethnicity. This trend has fallen heav¬ 
iest on the Black community, because fam¬ 
ilies there often did not have the financial 
and educational backing required to give 
their children the opportunity to acquire 

174 Wilson, 1997. 
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the credentials that are becoming increas¬ 
ingly necessary for social advancement. In 
Wilson’s terms there is now a permanent 
underclass that is not necessarily Black but 
tends to be heavily Black. 

If we accept Wilson's sociological anal¬ 
ysis, a psychological corollary follows from 
the Challenge Hypothesis stated in Chap¬ 
ter 1. An underclass that is divorced from 
the mainstream of the post-industrial soci¬ 
ety will fail to develop some of the cognitive 
skills that are required in that society. Since 
the divorce is not complete, there will be an 
overlap between the skills required in both 
the underclass society and the more gen¬ 
eral society. For instance, in modern Amer¬ 
ica an inner-city Black may not speak the 
standard American English dialect, but he or 
she can develop the ability to comprehend it 
by watching television. The same thing can¬ 
not be said for the skills involved in math¬ 
ematical or counterfactual reasoning. It is 
worth noting that in support of this explana¬ 
tion there are studies showing that there are 
no differences between Blacks and Whites 
in knowledge retrieval, providing that the 
opportunity for knowledge acquisition (of 
word definitions) is equal. 175 

Stanley Sue, a professor of Psychology 
at the University of California, Los Ange¬ 
les, has offered a social-belief explanation 
for the predominance of Asian-Americans 
in both high test scores and academic and 
professional achievement. 176 Sue rejects the 
statement sometimes made that Asians in 
America are a modern example of successful 
migration. He agrees that they have indeed 
succeeded in some fields, but argues that a 
careful analysis of mental health indictors 
indicates that the success has been accom¬ 
panied by a good deal of stress. He notes 
that Asian-American success has come pri¬ 
marily in fields such as engineering and 
medicine, where education and certification 
play a dominant role. Asian-Americans are 
less prominent in fields such as law and 

175 Fagan & Holland, 2002, 2007. 

176 See Sue and Okazaki, 1990, for a good discussion of 

Sue’s ideas. 


politics, where personal acceptance be¬ 
comes a central issue. 177 Sue argues that 
a concentration on scientific and technical 
careers, combined with a traditional belief 
that hard work will lead to success, can 
account for much of the data on Asian- 
American cognitive accomplishments. 

Ogbu presented an analysis some¬ 
what similar to Sue’s, but for African 
Americans. 178 Like Sue, Ogbu emphasized 
the importance of historically developed 
beliefs within the affected group as deter¬ 
minants of current behaviors. Ogbu then 
went on to distinguish between groups such 
as Asian-Americans, who were voluntary 
emigrants to the United States, and who 
arrived there with a historic belief in the effi¬ 
cacy of hard work, and African Americans, 
who were the descendants of an involun¬ 
tary migration, and who came from a social 
group that, after several generations of slav¬ 
ery and discrimination, had not been given 
very much reason to believe that hard work 
would be followed by rewards. 

I do not know of any similar anal¬ 
ysis of the relations between cognitive 
achievement, current practices, and histor¬ 
ical causes in the Hispanic community. In 
fact, I doubt that there can be such an anal¬ 
ysis for the community as a whole, for it 
is composed of very different subcommu¬ 
nities. Puerto Ricans recently from Puerto 
Rico, Puerto Ricans who moved to the conti¬ 
nental US from two to four generations ago, 
Cuban immigrants who left Cuba following 
the ascension of a communist dictatorship 
there in the 1950s, the descendants of Mex¬ 
icans who moved to present-day California 
and the Southwest over two hundred years 
ago, and recently arrived immigrants and 

177 An exception has to be made for the State of 
Hawaii, where Asian-Americans are the major¬ 
ity ethnic group and, predictably, have dominated 
state politics. In the other forty-nine states only 
two Asian-Americans have been elected governor 
as of January 1, 2010. Both have served recently: 
Gary Locke [Chinese descent) in Washington State 
(1997-2005) and Bobby Jindal (Indian descent), 
elected Governor of Louisiana beginning in January 
2008. 

178 Ogbu, 2003. 
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children of recent immigrants from Latin 
America are very different people. Racial 
and ethnic designations are fuzzy concepts. 
In the case of American Hispanics the con¬ 
cept is very fuzzy indeed. 

To sum up, there are powerful argu¬ 
ments and a good deal of evidence for the 
proposition that racial/ethnic differences in 
intelligence are influenced by a variety of 
social variables. There is no evidence point¬ 
ing to a single explanatory variable, nor is 
there likely to be. Statistically, the variables 
are highly collinear, so that different causal 
models can be said to describe the data to 
very nearly the same degree of accuracy. In 
many cases what appears to be an impor¬ 
tant variable turns out to be a proxy for 
other variables. For instance, income and 
family wealth are statistically predictive of 
the Black-White gap, but when noneco¬ 
nomic family characteristics, such as parent¬ 
ing style, are statistically controlled for, the 
wealth-IQ relation disappears. 179 Logically, 
there are all sorts of reasons to expect inter¬ 
actions between variables. Trying to sort out 
the relationship between cognitive perfor¬ 
mance and contemporary indices of noncog- 
nitive behaviors is a data analyst’s night¬ 
mare. 

Social commentators disagree about how 
the present situation came to pass. Ogbu and 
Wilson disagree over whether the situation 
for African Americans has developed over 
two or three centuries or is due to devel¬ 
opments in the last half-century. Sue falls 
somewhere in between, referring to beliefs 
held by an immigrant community that has 
been developing from successive waves of 
entrants for the last 150 years, but refer¬ 
ring (somewhat generally) to social tradi¬ 
tions that have a history of over a thousand 
years. 

I do not think we are likely to resolve 
these issues. As is the case for evolution¬ 
ary psychology’s "man the hunter” hypoth¬ 
esis, the stories about the historic origins 
of present-day behavioral practices are rea¬ 
sonable, but to distinguish among them we 
would have to have access to historical 

179 Yeung & Conley, 2008. 


data that has probably disappeared forever. 
More generally, rigorous methods of scien¬ 
tific analysis are difficult to apply to historic 
events. A social commentator may be able 
to convince us that his or her way of looking 
at the origins of racial/ethnic differences in 
intelligence is useful, but is unlikely to ever 
have either a sufficiently precise theory or 
sufficiently rich data to discriminate among 
different viewpoints. 

11.4.10. Biological Causes for Racial 
and Ethnic Differences 

Potential biological causes of racial/ethnic 
differences in intelligence may be either 
environmental or genetic. 

Chapter 9 presented evidence on three 
of the most toxic environmental pathogens: 
exposure to lead, excessive use of alcohol 
and other psychoactive agents (alcohol is by 
far the biggest problem), and severe mal¬ 
nutrition. What data there is indicates that 
atmospheric and environmental lead expo¬ 
sure may be greater in the African Ameri¬ 
can and Hispanic communities than in the 
White community. I know of no data for 
Asian communities, and in this case a dis¬ 
tinction would have to be made between 
recent immigrant groups and others. 

Alcohol abuse is more prevalent in the 
Black and Hispanic groups than in the 
White and Asian groups. Alcohol abuse is 
highly correlated with SES, thus confus¬ 
ing the issue. Failure to follow good dietary 
regimes is also a problem within the African 
American and Hispanic communities and, 
like alcohol abuse, is correlated with SES. 
Whether nutritional deficiencies in the two 
minority communities are severe enough to 
be a significant threat to cognitive develop¬ 
ment is debatable, although there certainly 
are international examples of severe malnu¬ 
trition in Africa and Latin America. 

What we lack are quantitative stud¬ 
ies determining whether differential expo¬ 
sures to biologically hazardous environ¬ 
ments (including self-induced ones) could 
account for the substantial gap in IQ scores 
between Whites and the two minority com¬ 
munities. The situation is not analogous 
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to the problem of discriminating between 
alternative accounts of history. Questions 
about the role that environmental hazards 
play in determining racial/ethnic differences 
in intelligence can be answered. As far as I 
know, though, the needed research has not 
been done. 

There is a great deal of contention 
over the role that genetic differences play 
in establishing racial/ethnic differences in 
intelligence. The following two quotes 
summarize the conclusions of two recent 
reviewers: 

For the race differences in IQ, we can be 
confident that genes play no role at all. 

Nisbett, 2009, p. 197 

. .. genetic and cultural factors carry the 
exact same weight in causing the mean 
Black-White difference in IQ as they do 
in causing individual differences in IQ, 
about 80% genetic-20% environmental by 
adulthood. 

Rushston <Sd Jensen, 2005, p. 27Q 

There seems to be a difference of opinion. 

The argument for genetic causes for 
group differences has been maintained by 
several serious researchers over the years. 
The three most prominent advocates of this 
position today are Arthur Jensen of the 
University of California, Berkeley; Richard 
Lynn of the University of Ulster; and J. 
Phillipe Rushton of the University of West¬ 
ern Ontario. The arguments they propose, 
which are essentially identical, were well 
presented in a 2005 paper by Rushton and 
Jensen. 18 Before discussing these claims, 
a comment about how genetic effects 
might influence racial/ethnic differences is 
in order. 

No one inherits a test score. In order to 
make a direct argument for a genetic cause 
for group differences in intelligence one has 
to show that there is some biological char¬ 
acteristic that can account for differences in 
intelligence between the groups being dis¬ 
cussed and that the characteristic is known 
or reasonably believed to be under genetic 

180 Rushton & Jensen, 2005. 


control. Two such characteristics have been 
proposed. 

One is information-processing capacity, 
and especially speed of information pro¬ 
cessing, as measured by elementary cogni¬ 
tive tasks such as reaction time paradigms. 
Rushton and Jensen argue that there is a 
substantial difference in speed of process¬ 
ing between Whites and Blacks, in favor 
of Whites, and smaller differences between 
Whites and Asians, in favor of Asians. 
The evidence they offer is based largely 
on Jensen's studies of differences in reac¬ 
tion times between Whites and African 
Americans on fairly simple tasks. 181 How¬ 
ever, a review of a wider range of stud¬ 
ies of racial/ethnic differences in reaction 
time and other information-processing tasks 
found considerable inconsistency in results. 
The reviewers concluded that the difference 
between Blacks and Whites, which would be 
expected to be the largest difference, was on 
the order of d <.20. 182 Given that the correla¬ 
tion between speed-of-processing measures 
and IQ scores is around .5, after correction 
for various artifacts, the expected contribu¬ 
tion of speed-of-processing measures to the 
Black-White test score gap would be .10, 
while the gap itself is on the order of 1.0. 
A lot remains to be explained. 

Rushton and Jensen, and (in separate 
papers) Lynn have also proposed that the 
difference between groups in test scores 
is due to differences in brain size. Brain 
size does have a correlation of about .35 
with intelligence within the White popula¬ 
tion. Brain size is almost entirely genetically 
determined. 18 * Therefore, evidence for sub¬ 
stantial differences between racial/ethnic 
groups in brain size would be an impor¬ 
tant link in an argument for a genetic 
basis for group differences in intelligence. 
However, such studies would be difficult 

181 Jensen, 1993. This study can be criticized for having 
given far fewer practice trials, fifteen, than is usual in 
reaction time studies. Studies within cognitive psy¬ 
chology have shown that speed measures become 
limiting factors in reaction time studies only after 
dozens, if not hundreds, of trials. See Ackerman, 
1986, 1987, for a discussion. 

182 Shepard & Vernon, 2008. 

183 Baare et al., 2001. 
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to arrange, due to the expense of obtaining 
brain images. Therefore, researchers inter¬ 
ested in this topic have made estimates of 
brain size differences from external mea¬ 
sures on the skull. This indirect method has 
its problems. 

The correlation between intelligence and 
cranial capacity, estimated from measure¬ 
ments on the skull, drops to about .2, which 
is not surprising as brain size is substan¬ 
tially but imperfectly related to skull size. In 
studies by Rushton the difference between 
White adults and Black adults in cranial 
capacity is 43 cm 3 , which corresponds to a d 
for cranial capacity of .46. 184 Combined with 
a .2 correlation, this leads to the conclusion 
that on the basis of skull size there should 
be a difference of d = .09 between Black and 
White test scores. If we accept the idea that 
brain size and processing speed are statisti¬ 
cally independent, the expected gap due to 
these factors is then .19, still far below the 
observed value of 1.0. 

Richard Lynn has argued that changes 
in intelligence over the past fifty years, the 
cohort (Flynn] effect described in Chapter 9, 
are probably due in part to changes in 
nutrition. 185 And he has used the correla¬ 
tion between cranial capacity and intelli¬ 
gence test scores to buttress his argument, 
for cranial capacity, unlike brain size, does 
have a substantial environmental compo¬ 
nent. To the extent that Lynn is right about 
nutrition, this would reduce the genetic 
effect upon intelligence associated with 
group differences in measurements on the 
skull. 

As further evidence of a genetic basis for 
Black-White differences Rushton and Jensen 
point to various studies in which the method 
of correlated vectors was used to show that g 
loadings on tests correlate with Black-White 

184 This is based on the figures provided in Rushton, 
1992, p. 405, text and Table 2. Rushton does not 
present a d value. I have calculated it by estimating 
standard deviations within groups provided in his 
Table 1, and then computing the average within- 
group variances. The mean difference was taken 
from the numbers given in the text, adjusted for 
military rank and sex. 

185 Lynn, 1998. 


differences and with heritability estimates. 
The ambiguities in the method of corre¬ 
lated vectors have already been commented 
upon. 

Rushton and Jensen also make a good deal 
of what they refer to as “admixture stud¬ 
ies,” in which mixed-race groups are shown 
to have intelligence test scores intermediate 
between those of Whites and Blacks. In gen¬ 
eral, such studies are done outside of the 
United States, (e.g., in South Africa), where 
“Black” and “White” are more clearly defined 
groups than they are in the US. The mixed 
groups are generally intermediate on a vari¬ 
ety of social and educational measurements. 
The studies are ambiguous unless these vari¬ 
ables are controlled. 

Rushton and Jensen, and in other writ¬ 
ings (to be discussed in the next section) 
Lynn make a few more points, but these 
are their major ones. In general, I find their 
arguments not so much wrong as vastly over¬ 
stated. But overstatement does not mean 
that there is no point to them. Rushton and 
Jensen present two conclusions. 

1. They argue that the hypothesis that 
Black-White (and, by extension, Asian 
and Hispanic) differences in intelli¬ 
gence are entirely due to environment, 
the hypothesis they refer to as the 
100% environment hypothesis, cannot 
be maintained. 

2. They propose as an alternative “default 
hypothesis” that the Black-White differ¬ 
ence is 80% due to genetic differences. 
They base this conclusion on the obser¬ 
vation that within Whites intelligence 
test scores have a heritability coefficient 
of .8. 

Rushton and Jensen (and Lynn) are cor¬ 
rect in saying that the 100% environmental 
hypothesis cannot be maintained. Nisbett’s 
extreme statement has virtually no chance 
of being true. However, the 100% environ¬ 
mental hypothesis is something of a stalking 
horse. Many researchers who are primarily 
interested in environmental differences asso¬ 
ciated with racial and ethnic differences in 
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intelligence would not be at all perturbed 
by an ironclad demonstration that, say, 3% of 
the gap is due to genetic differences. The real 
issue is over the identity and size of genetic 
and environmental influences on group dif¬ 
ferences in intelligence, not the existence of 
either one. 

The 80% default hypothesis is an extreme 
and excessively precise statement. It is based 
on the assumption that the factors that con¬ 
tribute to the between-group differences 
are the same as the factors that contribute 
to within-group differences. This is doubt¬ 
ful. The genetic variability associated with 
the "continent of origin” distinction (races, 
for short) is much less than the variability 
between people with the same continent of 
origin. Estimates have placed the between- 
group correlation at from to 5% to 15% of the 
total variation within the human genome. 
On a statistical basis this is quite enough 
to identify genetic clusters associated with 
racial groupings. 186 There are certainly envi¬ 
ronmental differences between racial/ethnic 
groups. Do they amount to 5% of the total 
environmental variation in the United States 
population? Are the genetic and environ¬ 
mental variations associated with intelli¬ 
gence distributed between groups in exactly 
the same proportion as they are distributed 
within groups? We have no idea. 

This is a topic about which there is a 
great deal of emotion and an unfortunate 
paucity of data. The direct evidence that 
we have for genetic effects does not come 
close to accounting for the size of the gap 
between White and African American test 
scores. Neither do environmental effects. 
And, unfortunately, the environmental evi¬ 
dence has often been presented as evidence 
that environmental effects do occur — which 
no advocate of genetic models has ever 
denied - but has not been presented in a 
way that permits a quantitative estimate 
of how important environmental effects are 
in determining group differences in intelli¬ 
gence in the population. 

i86Bamshad et al. ( 2004; Edwards, 2003; Tang et al., 

2005. 


The questions raised by the existence 
of racial/ethnic group differences in intelli¬ 
gence are complex. The genome exerts both 
distal and proximal effects, so sometimes it 
is not clear whether a particular influence 
should be regarded as a genetic or an envi¬ 
ronmental one. Consider the case of par¬ 
enting practices, shown to be an influence 
on a child’s development of intelligence. 
If these practices are partly under control 
of the parents’ genomes, and partly under 
the control of parental education, then the 
effect on the child could be called either 
environmental or genetic! Or an interaction 
between the two. Or we could say that the 
effect is genetic, but that it can be mod¬ 
ified by education, just as any number of 
genetic disorders can be modified by medical 
treatment. 

In spite of their complexity, questions 
like these are answerable. We can conduct 
quantitative searches for proximal and dis¬ 
tal influences, on both the genetic and envi¬ 
ronmental sides. While we do not know, 
today, what genes are associated with intel¬ 
ligence, we will find them. When the genes 
are identified it will be possible to deter¬ 
mine allele frequencies across different pop¬ 
ulations (with the caution that if the groups 
are in contact these frequencies will change 
over time!). Studies of environmental influ¬ 
ences could go beyond the demonstration 
stage to the stage of making estimates of 
the size of these effects outside of the 
laboratory - once again remembering that 
we will be shooting at a moving target, 
for environments drift faster than gene 
frequencies. 

This summary will probably not satisfy 
those who have taken strong stands on either 
side of the debate over racial and ethnic 
differences in intelligence. Bold hypotheses 
“rally the troops” and make great entrees 
for television talk shows. People who take 
intermediate positions are said to be “wishy- 
washy” or “afraid to say what they really 
think.” Nevertheless, the issue is complex, 
and oversimplifications do not help. There 
are group differences in intelligence, they 
are important, and there are both scientific 
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and social reasons for trying to understand 
them. Plausible cases can be made for both 
genetic and environmental contributions 
to differences in intelligence. The evidence 
required to quantify the relative sizes of 
these contributions to group differences is 
lacking. The relative sizes of environmental 
and genetic influences will vary over time 
and place. Some of these influences may 
be amenable to change, while others will 
be resistant to change. The relevant ques¬ 
tions can be studied. Denials or overly pre¬ 
cise statements on either the pro-genetic 
or pro-environmental side do not move the 
debate forward. They generate heat rather 
than light. 

And this is what I really believe! 

11.5. The International Picture 

Wealth and health are not distributed uni¬ 
formly across nations; the countries of 
Europe and North America and their deriva¬ 
tive oceanic cultures are best-off; Japan, 
Korea, and more recently China are closing 
the gap; sub-Saharan Africa is worst-off. The 
Middle East and South Asian nations and 
Latin America fall in between. Neither the 
genetic composition nor the socioeconomic 
histories of these regions are the same. To 
what extent is the present unequal distribu¬ 
tion of health and wealth due to differences 
in the cognitive capacities of different peo¬ 
ples, and to what extent must these differ¬ 
ences be taken into account as economies 
are developed over the globe? 

This is an old question. Herodotus specu¬ 
lated about different temperaments of peo¬ 
ples in the fifth century BCE. From the 
seventeenth century until modern times 
economists have discussed the topic under 
the general title human capital. This term 
is used collectively to refer to the devel¬ 
oped capabilities of a workforce, which can 
vary from being literate to having a high per¬ 
centage of people with engineering degrees. 
It is generally agreed that human capital is 
very important in determining the economic 
potential of a state or nation. Political lead¬ 
ers are more likely to justify taxes to support 


schools by claiming that an educated popu¬ 
lation has a greater economic potential than 
an uneducated population than by claiming 
that the educated have a moral and philo¬ 
sophical advantage. l8 7 

Economists seldom use the word “intel¬ 
ligence.” They are interested in the dis¬ 
tribution of developed skills in a society, 
and the implications of the distribution. 
How those skills came to be are of less con¬ 
cern to them. Economists are also well aware 
that for some people the word “intelligence” 
has connotations of fixed, genetically deter¬ 
mined constraints on mental capacity. Any¬ 
one interested in influencing policies con¬ 
cerning “human capital” (the economists’ 
euphemism for intelligence) is well advised 
to avoid using the word, for if someone 
does suggest that intelligence limits the socio¬ 
economic development of nations there is 
unnecessary Hell to pay. A spectacular case 
of this is presented in panel 11.6. 

However, some people do look at the 
international distribution of intelligence, 
and do argue that the present-day (and 
presumably past and future) distribution 
of wealth, health, and happiness is due 
in part to differences in national intelli¬ 
gence. Richard Lynn, whose work on sex and 
race/ethnic differences was discussed ear¬ 
lier in this chapter, and Tutu Vanhanen, a 
Finnish economist, invigorated the field with 
the publication of two challenging books 
and some related papers. I shall be highly 
critical of their empirical work, and even 
more so of their interpretations. They do 
deserve credit for raising important ques¬ 
tions in a way that has resulted in interest¬ 
ing and important findings. Before present¬ 
ing the Lynn and Vanhanen research and 
studies that amplified and redirected it, let 
us consider the problems faced by any study 
of national intelligence. 

187 Interestingly, this is a change in view. Thomas Jef¬ 
ferson argued for education on the grounds that 
it would make people better able to participate in 
political, not economic, life. There is some evidence 
that intelligence test scores are correlated with the 
relatively liberal values that have evolved (substan¬ 
tially!) from Jefferson's views of government and 
society. See Deary, Batty, & Gale, 2008. 
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11 . 5 . 1 . Methodological Issues 

Estimating the distribution of intelligence in 
a nation requires a probability sample of the 
residents in each nation and the use of tests 
that are valid in all the nations under review. 
These are stringent requirements. Represen¬ 
tative national samples are hard to obtain 
in the most advanced post-industrial coun¬ 
tries. The problem is virtually unsolvable in 
less developed countries, especially if these 
countries are subject to major unrest. As of 
2010 there is no way that the governments of 
the war-torn countries of the Congo, Soma¬ 
lia, or Afghanistan could obtain a census or 
probability sample of all their residents, for 
any purpose. 

It is often much easier to approximate 
a probability sample in a selected seg¬ 
ment of the population, such as schoolchil¬ 
dren (the most likely accessible population) 
or, in some countries, registrants for mili¬ 
tary service. You then face three problems. 
The accessible populations may differ across 
countries; the estimates will apply only to 
the segment of the population involved; 
and the recruitment procedures that con¬ 
struct the accessible samples from the gen¬ 
eral population may differ across nations. 
The problem is particularly acute in the case 
of schoolchildren. 

Consider the following contrast. In the 
United Kingdom essentially all children go 
to school, and the government has, from 
time to time, given well-standardized cog¬ 
nitive tests to students. In the Congo large 
parts of the country are cut off from gov¬ 
ernment control, and schooling outside the 
major cities is haphazard at best. Inferences 
about national intelligence based upon stud¬ 
ies of schoolchildren in the United Kingdom 
and the Congo would hardly be comparable. 

These examples are extreme. Differen¬ 
tial recruitment can occur in well-developed 
nations. South Korea reported that 42% of 
the participants in the Program for Inter¬ 
national Students Assessment (PISA) were 
girls; France reported 53 percent. 188 Any 
comparison between the PISA results from 

188 Rindermann, 2007. 


Korea and France would be confounded by 
male-female differences in sample recruit¬ 
ment. 

Obtaining comparable and appropriate 
intelligence tests is equally challenging. The 
point has already been made that the tests 
are weighted toward evaluating skills that 
are relevant to the developed world, and that 
any test necessarily is partly an evaluation of 
the possession of test-taking skills. One has 
to be open to the possibility that outside 
the developed world people may not have 
had a chance to acquire the skills relevant 
to industrial and post-industrial society, and 
even that the people being examined may 
not have acquired the skills required to deal 
with the test-taking paradigm itself. 

11 . 5 . 2 . Lynn and Vanhanen's Findings 

Lynn’s position is set forward in numer¬ 
ous papers and in two books, both coau¬ 
thored with the finnish economist Tutu 
Vanhanen. 189 The second book is basically an 
expansion of the first, with added data and 
analyses, none of which changes any of the 
conclusions of the first. I shall concentrate 
on the presentation in the second book, IQ 
and Global Inequality. Specific papers will 
be cited only if needed to make a point. 

Lynn and Vanhanen regard intelligence as 
being largely a genetically determined trait. 
Although they do not say so precisely, their 
argument makes no sense unless they also 
regard the genetically defined reaction range 
as being narrow, as they are not optimistic 
about the effects of such things as improved 
education. 

They believe that intelligence makes a 
contribution to indicators of national health 
and wealth both through its interaction with 
education and directly. Their argument is 
that intelligent people are more educable 
(which is certainly true) and that, in addi¬ 
tion to the benefits of education, intelligent 
people are generally better reasoners, and 
hence more able to deal with the problems 
of a complex society. Lynn, in his numer¬ 
ous publications, has also made it clear 

189 Lynn & Vanhanen, 2002, 2006. 
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Table 11.5. Correlations between IQ estimates and 

selected indices of national well-being 

Correlation with 

Index of Quality of Life 

IQ Estimate 

Gross national income/person (adjusted for 
local purchasing power) 

.684 

Adult literacy rate 

.642 

Tertiary education 

.746 

Life expectancy 

■m 

Democratization Index 

.568 


Note: Data selected from Lynn and Vanhanen, 2006, Table 6.1. 
Figures shown indicate the data based on 113 observed points. 


that he regards intelligence test scores as 
equally valid across nations, including those 
nations usually referred to as “developing” 
or "Third World.” For instance, he and Van¬ 
hanen include in their database test scores 
from the United States (mean IQ = 98), the 
United Kingdom (100), the Marshall Islands 
(84), China (105), Ghana (71), and Equatorial 
Guinea (59, to be discussed later). 190 Their 
statistical treatment gives all points equal 
validity. 

Lynn and Vanhanen obtained estimates 
of IQs for each nation by searching vari¬ 
ous publication sources. This produced data 
for 113 countries. They estimated the IQs 
for another seventy-nine countries by using 
"comparison IQs” of "neighboring or other 
comparable countries.” This provided them 
with three sets of data, 113 direct observa¬ 
tions, 79 imputed data points, and a total of 
192 data points, obtained by combining the 
first two sets. 

In their main analyses Lynn and Van¬ 
hanen considered the relation between the 
IQ measures and five measures of, as they 
put it, “[t]he quality of human condi¬ 
tions.” The measures of quality they used 
were gross national income per capita 
(corrected for local purchasing power), 
the adult literacy rate, the fraction of 
the population enrolling in tertiary educa¬ 
tion (college/university and beyond), life 
expectancy, and an index of democratiza¬ 
tion developed by Vanhanen from previous 


work. This index can be thought of as a 
measure of the extent to which citizens of 
a country are free to participate in polit¬ 
ical life. The highest marks on the index 
(forty and above) were assigned to Belgium, 
Denmark, the Netherlands, and Switzer¬ 
land. Most large European and North Amer¬ 
ican countries scored in the mid-thirties. 
Dictatorships that (as of 2010) enforced a 
one-party state, such as China, Cuba, Kaza¬ 
khstan, and North Korea, scored zero. 

The variables were merged into a single 
quality of human conditions (QHC) index. 
Lynn and Vanhanen then calculated regres¬ 
sion equations predicting the QHC index or 
one of its components from the IQ scores, in 
each of their three data sets. Additional anal¬ 
yses were also carried out, relating IQ scores 
to other social indices, such as measures of 
undernourishment and economic freedom. 
None of these analyses made any difference 
to Lynn and Vanhanen’s conclusions, so I 
will concentrate on their central points. 

Table 11.5 shows the correlations between 
the IQ estimates and the five indices of qual¬ 
ity of human conditions. The correlations 
range from a high of .773 for life expectancy 
to a low of .568 for democratization. Given 
this level of correlation with the individual 
variables, it is inevitable that there will be 
a high correlation with the composite QHC 
index, and there is, .805 in the 113 observed 
data points. 191 The data presented is typi¬ 
cal of a number of other analyses, all of 


190 Lynn & Vanhanen, 2006, Table 4.3. 


191 Lynn & Vanhanen, 2006, Table 7.1. 
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which make the same point. On a national 
basis the IQ estimate is a good predictor of 
socioeconomic indicators of quality of life. 
Lynn and Vanhanen reach a stronger con¬ 
clusion, about causation, which I will discuss 
later. 

Lynn and Vanhanen present one other 
set of analyses that are relevant to our con¬ 
cerns. They group countries by geographic 
area, note the predominant racial groups 
in those areas, and then calculate a repre¬ 
sentative value for IQ by race, based on 
this data. For instance, sub-Saharan Africans 
(i.e., Black, “Negroid” in some of Lynn’s 
writings) are assigned a typical IQ of 67, 
East Asians receive 105, 99 for Europeans, 
and 90 for Southeast Asians. They further 
show that by assigning intermediate values 
to “mixed-race” groups, such as the Mestizos 
in some Latin American countries and “col¬ 
oreds” in South Africa, they can produce an 
estimated IQ for the Latin American nations 
that predicts observed values. 

Lynn and Vanhanen, and Lynn in other 
writings, draw three conclusions from their 
research. 

1. The substantial correlations between IQ 
scores and various socioeconomic indi¬ 
cators shows that differences between 
countries in wealth, health, and to some 
extent happiness are very largely caused 
by differences in intelligence. 

2. The differences in intelligence between 
nations are largely due to differences in 
the racial composition of the national 
and regional population. 

3. While the differences have some envi¬ 
ronmental roots, the major differences 
are due to genetic inheritance. Nutri¬ 
tional differences are the most impor¬ 
tant environmental factors. They do 
allow some influence for education, but 
believe that it is not likely to be large. 

Conclusions like this are bound to be con¬ 
troversial. Therefore, it is worth taking a 
closer look at the Lynn-Vanhanen data, and 
then looking at another analysis aimed at the 
same questions. 


The Lynn and Vanhanen database for 
intelligence is highly suspect in one way, and 
accurate enough in another. 

In both their publications Lynn and Van¬ 
hanen disregarded any question about the 
validity of various intelligence tests across 
different countries and cultures, and ignored 
considerations of differential sampling. In 
some cases their criteria for inclusion can 
only be described as a blunder. For instance, 
their data point for Equatorial Guinea, said 
to have a national IQ of 59 (which would 
be well into the mentally retarded range 
in a developed country), was taken from a 
report 192 that, while it did discuss Africa, 
stated clearly that the IQ 59 figure referred 
to children in a school for the developmen- 
tally disabled in Madrid! Many of the sam¬ 
ples were grossly unrepresentative, and the 
criteria for selection of studies to include in 
their work were not made clear. 

These concerns have led critics to claim 
that the data for developing countries is 
both unrepresentative and extremely sus¬ 
pect. In order to test this hypothesis Werner 
Wittmann and I conducted separate anal¬ 
yses of the data points for developed and 
developing countries, as presented in Lynn 
and Vanhanen’s 2002 book. 193 We found that 
the residuals (i.e., the extent to which their 
predictions missed their targets) were much 
greater for developing than for developed 
countries. In much more detailed reviews 
which found many more problems of selec¬ 
tivity and scholarship in the Lynn and Van¬ 
hanen data set, Wicherts, Dolan, and van 
der Maas raised their estimate of the median 
sub-Saharan African’s IQ of Lynn’s estimate 
of 67 (an “educable mental retardate” in the 
jargon of clinicians in the developed coun¬ 
tries) to 82. 194 Wicherts and his colleagues 

192 Fernandez-Ballesteros et al., 1997. 

193 Hunt & Wittmann, 2008. 

194 Wicherts, Dolan, & van der Maas, 2010a. Wicherts, 
Dolan, and van der Maas based their calculations 
on a large number of studies using the Raven tests, 
in order to avoid problems involved in comparing 
different tests Lynn and Meisenberg (2010) claimed 
that the studies Wicherts and his colleagues had 
used were not representative, and offered their own 
survey, which reinforced their original point. In a 
second paper Wicherts, Dolan, and van der Maas 
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pointed out that if you go backward in 
time by correcting for the increases in IQ 
scores over the past fifty years, this is the 
modern equivalent of the IQ of the Dutch 
just after World War II. The Wicherts 
and colleagues work showed that Lynn and 
Vanhanen’s research was highly biased 
toward underestimation of countries in sub- 
Saharan Africa. Given that this was true of 
Africa, one has to be more than a little suspi¬ 
cious of the accuracy of their data for other 
developing countries. 

In spite of these concerns, Lynn and 
Vanhanen's conclusions about the correla¬ 
tions between IQ estimates and measures of 
social well-being are probably correct. The 
spreads between nations in terms of various 
measures of social well-being are so wide 
that it is fairly easy to find a correlation 
between them. Also, as measures of social 
well-being are markedly collinear, a predic¬ 
tor that tracks one outcome measure will 
probably track them all. These points were 
illustrated by two researchers from George 
Mason University, Deborah Whetzel and 
Michael McDaniel. They assigned a value of 
90 to all reported national IQs below 90 in 
the 2002 database. This arbitrary assumption 
was higher than the Wicherts and colleagues 
estimate, which was not available at the time 
that Whetzel and McDaniel did their anal¬ 
ysis. All major trends in the 2002 data still 
held. I am virtually certain that the same 
thing would happen in the 2006 database, 
although I do not know of any such analysis. 

There is a simple reason for the fact 
that correlations were obtained in spite 
of inaccurate data. Lynn and Vanhanen's 
results were driven by the sharp dif¬ 
ferences in both intelligence test scores 
and measures of social well-being between 
the European-North American-Northeast 
Asian (ENAMA] regions of the world and 
“the rest,” largely sub-Saharan Africa and 
South Asia (SASA). The South America 

(2010b] showed that Lynn and Meisenberg’s tech¬ 
niques for selecting papers to cite were not related 
to measures of representativeness in the original 
research, and that they appeared to introduce a sub¬ 
stantial bias toward underestimating the cognitive 
capacities of sub-Saharan populations. 


and the Middle East regions (SAME] gen¬ 
erally fall somewhere between the other 
two regions, in terms of both intelligence 
test scores and indicators of well-being. 
However, substantial correlations, although 
lower than Lynn and Vanhanen reported, do 
hold within the developed countries, where 
the intelligence estimates are probably far 
more accurate. 195 

11.5.3. A Much Closer Look: 
Rindermanris Analyses 

Lynn and Vanhanen rested their case largely 
on bivariate correlations between variables, 
which is a suspect procedure. They con¬ 
cluded that gross measures of educational 
accomplishment, such as the literacy rate 
or the number of people involved in ter¬ 
tiary education, were less important than 
intelligence as predictors of national well¬ 
being. A more detailed analysis of their own 
data, presented in Table 11.6, raises ques¬ 
tions about this conclusion. The entries in 
Table 11.6 indicate the fraction of variance 
in the target variable (GNI per capita, life 
expectancy, or Democracy Index) that can 
be associated with various possible causal 
factors. For example, the first entry in the 
first row, .47, indicates that 47% of the vari¬ 
ance in GNI per capita, across nations, can 
be statistically predicted from variations in 
the IQ estimate. That is impressive. The 
second entry in the first row, .07, modifies 
the impression considerably. It shows that 
after allowing for associations between IQ 
and GNI that can be predicted from rate 
of participation in tertiary education, only 
7% of the variance in GNI is associated with 
IQ. The third entry in the row, .29, shows 
that 29% of the variance in GNI is associated 
with national differences in participation in 
tertiary education, after having allowed for 
differences in IQ. 

Analyses such as those in Table 11.6 show 
the difficulty of interpreting bivariate cor¬ 
relations in situations where the variables 
involved are highly collinear. Suppose we 
regard participation in tertiary education as 

195 Hunt & Wittmann, 2005. 
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Table 11.6. Squared correlations and partial correlations between 
target variables and IQ scores and education variables 


Variable 

IQ 

IQlTertiary 

TertiaryttQ 

IQiLiteracy 

Literacy\IQ 

GNI/capita 

O.47 

0.07 

0.28 

0.25 

0.07 

Life expect. 

0.60 

0.29 

0.07 

0.35 

0.19 

Democracy 

O.32 

0.01 

0.25 

0.11 

0.09 


Note: The calculations are based on the data presented in Lynn & Vanhanen, 
2006, Table 6.1. GNI - gross national income; Life expect. - life expectancy 
at birth; Democracy - index of democratic institutions in a nation, developed 
by Vanhanen; Tertiary - fraction of population with some post-secondary 
education; Literacy - literacy rate. 


an indicator of the general state of educa¬ 
tion in a country. One could argue that if 
the population as a whole has generally high 
intelligence the country will have a good 
school system, and as a result a high level 
of income per person. Or one could argue 
that because a country has a good school 
system the people will acquire high intel¬ 
ligence, and thus have high incomes. Or, 
as is more likely, there may be reciprocal 
influences between wealth, intelligence, and 
education. 

We now turn to a study that dealt with 
such collinearities and ambiguities. 

Heiner Rindermann, of the Otto-von- 
Guericke University in Germany, has car¬ 
ried out multivariate investigations of 
national cognitive skills, wealth, and well¬ 
being much further, and in a much more 
sophisticated way, than did Lynn and Van¬ 
hanen (or than I have done in my reanalysis 
of their data here.). 196 Lynn and Vanhanen 
paid relatively little attention to interna¬ 
tional assessments of educational data, such 
as the PISA studies, except to use them 
to validate their national intelligence esti¬ 
mates. (In general the correlations between 
the educational assessment data and the 
Lynn and Vanhanen intelligence estimates 
are in the .8-9 range.) As Rindermann 
notes, researchers in intelligence and edu¬ 
cation use different words, but they all wind 
up generating tests that evaluate reasoning 

196 Rindermann, 2007. 


and knowledge, albeit with somewhat dif¬ 
ferent balances between the two. 

The educational assessment data suffers 
from the disadvantage that it is restricted to 
children and adolescents, but it has several 
counterbalancing advantages. Compared to 
the intelligence data the educational sam¬ 
ples are much larger and more representa¬ 
tive of the countries involved. Where the 
samples appear to be unrepresentative, as in 
the case of differential participation rates by 
students across countries, statistics on the 
manner of unrepresentativeness are often 
available. These statistics can be used to 
make a revised estimate of a hypothetical 
national average. For example, if a country 
has low participation rates because it pro¬ 
vides schools for only some of its children, 
the national estimate, based on the perfor¬ 
mance of schoolchildren, can be adjusted 
downward. Within a study, the same tests 
or tests that are "controllably different,” as 
in the case of translations, are used in each 
country. 

There is another important advantage to 
using educational rather than intelligence 
test data. In several countries comparable 
educational surveys have been carried out at 
different points in time. Longitudinal anal¬ 
ysis can then be applied to determine what 
variables at time 1 best predict the state of a 
country at time 2. If the time period between 
evaluations is close to a generation (thirty 
years for humans), the best data for this 
prediction are the cognitive skills of school 
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children at time 1, for they are the ones who 
will he running the society at time 2. 

Rindermann first showed that there is 
a huge general factor for 'national cogni¬ 
tive skill/’ calculated across countries. The 
Lynn and Vanhanen estimates load heav¬ 
ily on this factor, although they are not the 
best marker. 197 The centrality of this factor 
is amazing; the range of loadings is from .97 
to 1.0I The reason that this is a bit surpris¬ 
ing is that some of these tests were intended 
to cover special topics, such as reading or 
mathematics. On a cross-national basis, if 
the children read well they do math well 
(and vice versa)/ 98 

Rindermann then created a measure of 
national cognitive skill, based on the above 
analyses. He dropped the Lynn and Van¬ 
hanen data set from this analysis but, as 
the high factor loadings show, it could not 
have made much difference if it had been 
included/ 99 On a regional basis Rindermann 
reconstructed, with much better data, the 
ENAMA-SASA-SAME groupings that Lynn 
and Vanhanen found. The lowest scores 
are found in sub-Saharan Africa. The only 
exception is Haiti in the Caribbean, where 
scores are also quite low. The two interme¬ 
diate groups of scores are found in South 
Asia, the Middle East and North Africa, and 
in South America. The highest scores are 
found in northeast Asia, Europe, and North 
America. 

Rindermann has reported a number of 
correlations between his cognitive skills 
index and indices of economic and politi¬ 
cal well-being. Some mirror the Lynn and 
Vanhanen data. For instance, the corre¬ 
lation with gross domestic product per 

197 Rindermann, 2007, Table 1. 

198 This does not mean that there are no variations in 
relative skill. Some countries are “tilted” toward 
high math scores or high language scores. See 
Wittmann, 2005. The same countries differ in level 
of performance, as defined by a mixture of math¬ 
ematics, reading, and other scores. In a statistical 
analysis variations in level are much greater than 
variations in tilt. 

199 The factor loading for the Lynn and Vanhanen esti¬ 
mates was .96. Rindermann’s reason for dropping 
this data set was that it was a potpourri of mea¬ 
sures taken at different times, and he wished to do 
longitudinal analyses. 


capita (GDP/c) was .60, close to the Lynn- 
Vanhanen estimate. Other indicators show 
similar trends. The correlation with an index 
of economic freedom was .52, and with eco¬ 
nomic growth .44. 

The correlation between the cognitive 
skills index and fertility (children per 
woman) was strongly negative, —.73. This is 
an important finding, because it may point 
the way to future changes in cognitive skills. 
As was pointed out in Chapter 9, there 
is a negative correlation between indices 
of intelligence and family size. Although 
family sizes are much larger in the devel¬ 
oping than in the industrially developed 
countries, they are dropping as the devel¬ 
oping nations become urbanized, probably 
because of many other social changes related 
to increasing socioeconomic opportunities 
for women. This may presage an increase 
in intelligence in the developing countries 
in coming generations. Only time will tell. 

For those who like food for thought, 
the correlation between national cognitive 
skill and homicide rate was —.23, and 
the correlation between cognitive skill and 
the rate of solved homicide cases was 
.32. Smart populations produce moderately 
smart detectives. 200 

Because educational data was available 
over time, Rindermann was able to conduct 
causal analyses, in which he determined 
the extent to which a social or psycholog¬ 
ical variable measured at time 1 predicts 
social and psychological variables at time 
2. His analysis relied on a technique called 
cross-lagged analysis, which made it possible 
to determine standardized path coefficients. 
These statistics can be interpreted as mea¬ 
suring the extent to which a raise of one 
standard deviation unit in a predicting vari¬ 
able will cause a raise, in standard deviation 
units, of a predicted variable. The analysis 
produced two interesting results. 

The path coefficient between cognitive 
ability, measured in the 1960-70 period, and 
GDP/c, measured in 1998, was .29, which 
indicates that raising academic achievement 
by one deviation unit will “pay off’ by 

200Rindermann, 2008a, Figure 2. 
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increasing GDP/c by almost 30%, no small 
thing. Going the other way, the path coeffi¬ 
cient from GDP/c in 1980 to cognitive abil¬ 
ity in the 1990s was .21, indicating a feedback 
process. Investing in education does pay off 
in the future. 201 This conclusion has been 
strengthened by subsequent analyses, which 
showed that by far the best predictors of 
schoolchildren's cognitive competences are 
countries’ general adult educational level, 
and the extent to which a country provides 
for early educational experiences, in pre¬ 
school and kindergarten. 202 

A second set of path analyses focused on 
the relation between the national level of 
cognitive skill and political development. 205 
Cognitive abilities measured in the 1960-70 
period predicted levels of democratization, 
the rule of law, and political freedom in the 
1990s. An interesting point was that in these 
analyses sheer level of education, as indexed 
by data on attendance at different levels, 
was generally as good a predictor as cog¬ 
nitive abilities, which could be considered 
a measure of educational accomplishment. 
Path coefficients were higher for the polit¬ 
ical than for the economic indicators. An 
increase of one standard deviation unit in 
cognitive abilities in the 1960s and 1970s was 
associated with a .71 deviation unit increase 
in the political freedom index in the 1990s. In 
this case there were no indications of recip¬ 
rocal influence. Political freedom in the early 
period did not predict cognitive ability in the 
later period. 

Dictators have a problem. In order to 
increase their nation’s GDP/c they should 
invest in education. But if they do, it can 
come back to haunt them. 

11.5.4. Contrasting Lynns Work 
and Rindermann s 

Because Lynn’s work has been quoted 
widely by some authors, it is worth look¬ 
ing at the differences between the Lynn and 
Rindermann approaches. 

201 Ibid., Figure 5. 

202 Rindermann & Ceci, 2009. 

203 Rindermann, 2008b. 


Rindermann’s work makes a compelling 
case that investment in education increases 
national cognitive skills, which in turn 
increase wealth, which allows for more 
investment in education. This leaves open 
the question of why some nations have got¬ 
ten on the track toward success, and others 
have not. 

Lynn and Vanhanen have no hesita¬ 
tion. The ENAMA-SASA-SAME group¬ 
ing suggests to them an obvious correla¬ 
tion with race, especially since the lowest 
part of the SASA group is in sub-Saharan 
Africa. 204 They observe parallels with the 
within-country ordering of intelligence by 
racial/ethnic group: Asians, Whites, various 
mixed racial/ethnic groups, and then sub- 
Saharan Africans. They assign a major role 
to genetics, and also allow a role for nutri¬ 
tion, especially if children's malnutrition is 
extreme. 

Rindermann's causal analysis is straight¬ 
forward and believable. It accounts for a 
great deal of the variance in such variables 
as GDP/c, for the residual unexplained vari¬ 
ance amounts to less than 20% of the total 
variance. Therefore, while you can always 
explain a finding by unmeasured variables, 
there is not much left to explain. I hope that 
future studies explore the effect of some 
physical environmental variables, such as 
nutrition, but this, presumably, will come. 

Lynn's explanations cannot be ruled out, 
but he goes far beyond the data. To see this, 
let us consider some alternatives. 

11.5.5. CLoiv the International Differences 
Arose: Two u Just So " Stories 

Lynn and Vanhanen and Rindermann agree 
that cognitive skills vary across nations. 
Why? Lynn and Vanhanen, and Lynn in 
other writings, 205 have offered a distal expla¬ 
nation, which purports to explain how 
national differences arose in the genotype 
for intelligence. Their argument is that as 
people migrated northward away from early 
centers of civilization they encountered 

204 Lynn & Vanhanen, 2006, Chapter 12. 

205 Especially Lynn, 2006. 
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progressively harsher conditions due to cold 
and wide seasonal variations in the availabil¬ 
ity of food. Therefore, the selective pres¬ 
sure for high intelligence was stronger in 
the high northern latitudes. As a result the 
people in northern Europe and northeast 
Asia, and populations derived from them, 
such as the present North American 
and Australia-New Zealand populations, 
became smarter than those who had 
remained behind, in the more forgiving cli¬ 
mate of the subtropics. 

This sort of explanation can be inter¬ 
preted as the regeneration of the justifica¬ 
tions for European colonization. But an idea 
is not wrong just because the Victorians (and 
Herodotusl) thought it to be true. There 
have been attempts to fill out and mod¬ 
ify Lynn’s hypothesis. One pair of inves¬ 
tigators has reported correlations between 
Lynn’s IQ estimates and two graduate stu¬ 
dents' estimates of the skin colors of peo¬ 
ple in various countries. (The students had 
not been to the country in question.) Pos¬ 
itive findings were interpreted as support 
for the hypothesis. 206 Another investigator, 
Sastsoi Kanazawa of the London School of 
Economics, has argued that Lynn is right in 
general, but that evolution was not due to 
the cold or seasonal change per se. It was due 
to the need to meet challenges that were not 
present in the evolutionary homeland from 
which mankind evolved, in the area occu¬ 
pied by present-day Ethiopia. 207 

Lynn supports his hypothesis by refer¬ 
ence to group differences in brain size, esti¬ 
mated from studies of cranial capacity. Even 
if this data is accepted, the group differences 
in brain size and the correlations between 
brain size and intelligence are far too small 
to account for the large racial/ethnic dif¬ 
ferences in IQ scores. Besides, the fossil 
record indicates that human brain sizes have 
decreased over the past 20,000 years. 208 This 
coincides with the expansion of Homo sapi¬ 
ens into the high latitudes following the 
retreat of the glaciers. This is not in accord 

2o6Templer & Arikawa, 2006. 

207 Kanazawa, 2004, 2008. 

208Geary, 2005, p. 53. 


with Lynn’s proposal that northern expo¬ 
sure, if you will, produced bigger brains. 

Using the record of historical events, 
rather than modern test scores, Jared 
Diamond has proposed a quite different 
scenario. 209 He maintains that the problem¬ 
solving skills and general cultural knowl¬ 
edge of a people will be determined by the 
extent to which they are forced to con¬ 
front other groups of people and exchange 
ideas with them. Prior to relatively modern 
times, geography posed substantial barriers 
to such exchanges, for pre-industrial groups 
found it difficult to move through unfamiliar 
ecologies. 

To take one of Diamond's favorite exam¬ 
ples, people who lived in the highlands of 
the island of New Guinea developed agri¬ 
culture at about the same time it was devel¬ 
oped in Mesopotamia. The New Guinea 
highlanders were isolated from the coast 
by harsh mountain and jungle conditions; 
Mesopotamia was at the hub of potential 
travel routes into China and India, Europe, 
and northern Africa. Except for agriculture, 
the New Guinea highlanders were stuck in 
the Stone Age until they were contacted 
in the 1930s. The development of agricul¬ 
ture in Mesopotamia was a major step in 
the history of civilization. 

According to Diamond the European- 
northeast Asian peoples prospered intellec¬ 
tually because the major axis of the Eurasian 
land mass, East-West, facilitated commu¬ 
nication, while the major axes of Africa 
and the Americas, North-South, presented 
major barriers to communication. The rel¬ 
evant communication took place over his¬ 
toric, not evolutionary, time. Diamond can 
be read as stressing social evolution of 
societies, not genetic evolution of individ¬ 
uals, as a major force in the development of 
intelligence. 

We have two truly orthogonal explana¬ 
tions: the East-West one and the North- 
South one. Is there any evidence that can 
help us decide? 

Personally, I am more impressed by Dia¬ 
mond's emphasis on the spread of ideas over 

209 Diamond, 1997. 
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historical time than by Lynn's ideas about 
presumed selective pressures on the geno¬ 
type over evolutionary time. However, both 
Lynn’s and Diamond’s analyses are "just so” 
stories. Our present-day distribution of cog¬ 
nitive skills is the result of the unique history 
of humans on Earth. We can develop stories 
about how this distribution came to be. We 
can show that some are not in accord with 
the facts, but given only one history, over 
evolutionary or historic time, it is extremely 
unlikely that we will settle on one of sev¬ 
eral alternatives. What is the right one? We 
could answer this only by a careful analy¬ 
sis of the cognitive behavior of populations 
long since extinct. And this data we cannot 
have. 

11.5.6. Summing Up the Evidence for the 
Worldivide Distribution of Intelligence 

I have spent some time discussing Lynn and 
Vanhanen’s work for two reasons. One is 
that their results have been widely publi¬ 
cized but, to my concern, too often accepted 
and too rarely carefully critiqued. The sec¬ 
ond one is that these findings are important. 

Lynn and Vanhanen’s critics have tended 
to reject their ideas out of hand, either 
because they see IQ scores as irrelevant out¬ 
side of the industrial/post-industrial coun¬ 
tries or because they are concerned about 
the weakness of many of the data points. 
Such criticisms miss the mark. IQ scores are 
not irrelevant as indicators of intelligence, 
in the conceptual sense, in the developing 
world; they are partially relevant. As Rinder- 
mann showed, IQ scores are so highly cor¬ 
related with indices of educational achieve¬ 
ment that, statistically, one can serve as a 
proxy for the other. Both are indicators of 
the extent to which a national population 
possesses the cognitive skills required in the 
post-industrial world. These indicators are 
important because the world is increasingly 
dominated by the developed countries, so 
the future of developing countries is tied to 
the extent that they can acquire the skills 
needed in post-industrial society. 

Lynn and Vanhanen are investigating 
a question that deserves serious study. 


Therefore, it is not surprising that they 
close both of their books with policy rec¬ 
ommendations. They believe that genetic 
constrictions are so great that, although 
modest improvements may be made by 
improved nutrition, not very much can 
be done to improve human capital in 
the poorer nations. They are especially 
pessimistic about the sub-Saharan African 
nations. They say 

The persistence of differences in intelligence 
between nations is inemtable, and so, too, 
will be the consequences: the persistence of 
national differences in wealth. Or, as St. 
John put it 2000 years ago, li pooryou have 
always with you." 

Lynn and Vanhanen, 2006, p. 293 

Lynn and Vanhanen’s conclusion is a 
good example of a mindset for defeat. We 
have a problem; it’s hard to solve; let’s 
regard it as unsolvable. Rindermann’s anal¬ 
ysis shows that they have badly underval¬ 
ued the effects of investment in education. 

I, personally, agree with Diamond that the 
important thing is to have an exchange of 
ideas in a population - something that is 
related to but more than formal education. 
So I have a very different view. 

Can we do something to improve the 
intelligence of the poorer, often desperately 
poorer, peoples of the world? I wrote this 
chapter the week that Barack Obama was 
inaugurated President of the United States. 
My question was answered by the signature 
slogan of his campaign: 

Yes we can 1 . 

Barack Obama, in many speeches dur¬ 
ing the 2008 US presidential campaign. 

Whether we actually will do anything to 
improve cognitive skills on an international 
scale is a policy decision, and as such beyond 
the scope of this book 

II. 6. Closing Comments on Group 
Differences in Intelligence 

Intelligence does vary with demography. 

As we age, our information-processing 
mechanisms deteriorate. The speed of 
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mental processing slows, and our ability to 
control attention lessens. Until great age, 
though, these effects do not begin to match 
the beneficial effects of practice. Your fifty- 
five-year-old surgeon may not be as quick 
as a physician just out of medical school, 
but the experienced surgeon is likely to rec¬ 
ognize a situation more rapidly than a neo¬ 
phyte, and to be much more efficient at deal¬ 
ing with it. Your fifty-five-year-old airplane 
pilot may not have the same capacity to con¬ 
trol attention that he or she had thirty years 
ago, but the experienced pilot knows better 
what to attend to, and has a library of expe¬ 
riences that can be drawn on. Fluid intelli¬ 
gence, Gf, may be a dramatic thing to study 
in the laboratory, but in everyday life there 
is a great deal to be said for Gc. The quickest 
way to solve a problem is to recognize that 
you have seen it before and already know 
the answer. 

These trends are important. The popula¬ 
tion is aging. Sheer economics dictates that 
we will have to utilize an older workforce. 
When life expectancies approach ninety, 
neither the individual nor the society can 
afford retirement at sixty. Designers of 
future workplaces must take account of the 
changing nature of cognition during the sec¬ 
ond half of life. 

Men and women do not think alike; a 
statement that may be a revelation to psy¬ 
chologists but certainly will not surprise 
novelists. There is little, if any, average dif¬ 
ference in general reasoning between men 
and women. In defiance to the novelists’ 
intuitions, when it comes to general reason¬ 
ing men are more variable than women. 

Just as aging is a matter of degree, so 
is thinking like a man or a woman. Over 
and over again in this chapter I have used 
the word “tend.” That is important. Women 
tend to rely more on verbal thinking, men 
tend to be better at spatial-visual reasoning. 
There is plenty of overlap. Nevertheless, I 
think that we will continue to see that most 
helicopter pilots are men. 

Why these differences occur is still a topic 
of research. The brain size hypothesis put 
forward so strongly by Lynn and Rushton 
appears to me to be a nonstarter. Differ¬ 


ences in brain structure and hormonal influ¬ 
ence are far more likely causal factors. Given 
the influence of prenatal and neonatal hor¬ 
mone circulation upon brain development, 
these causes are probably intertwined. And 
then there are the social causes - differen¬ 
tial experience and different values for social 
roles. These differences have their influence; 
and they have their reasons for being, both 
in the past and in today’s society. My own 
belief is that individuals should be as free as 
possible to choose their social roles, within 
the limits of their own capacity, and that 
other people should respect those choices. 
At this point we are getting into the realm 
of social belief, rather than a discussion of 
the science of human intelligence, so I will 
say no more. 

Suppose it were possible to take those 
cognitive skills that are required to “make 
it” in modern, post-industrial society and 
to assign every person in the world a num¬ 
ber, reflecting how many of those skills he 
or she had. (If we had the ultimate IQ 
test, we could do this!) We would find that 
within countries some racial/ethnic groups 
had a lower distribution of numbers than 
other groups had. As in the case of male- 
female differences, there would be substan¬ 
tial overlap, although probably not as much 
as the overlap in the distribution of numbers 
between men and women. What would we 
find across countries, and more particularly, 
within the developed countries? 

The northeast Asian and White eth¬ 
nic groups would have almost equal num¬ 
bers. Which group had the highest aver¬ 
age would probably depend upon whether 
we weighed verbal skills more heavily than 
spatial-mathematical skills or vice versa. 
What is the best weighting? It depends upon 
whether you are looking for the next Ein¬ 
stein or the next Shakespeare. 

African-derived groups, "Blacks” in a jar¬ 
gon that ignores the fact that there are 
Melanesian and Australian-Papuan groups 
that are every bit as dark as Africans but are 
genetically distant from them, would almost 
certainly tend (that word again) to have 
lower numbers on a test of skills required 
to make it in post-industrial society. Some 
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of these skills are undoubtedly relevant to 
all societies, but we have little idea what 
the mix is. Genetically, the African-derived 
groups cluster differently from other groups, 
but the classification is fuzzy, especially 
when different racial/ethnic groups have 
been living close to each other for gener¬ 
ations. Socially, the groups that have low 
scores are generally those groups that have 
the least contact with post-industrial soci¬ 
ety. This remark applies both to national 
groups and to segments of a national soci¬ 
ety that are segregated from each other, or 
from schooling and social opportunities, to 
any degree. 

Are these racial/ethnic distinctions 
important? Yes, because to the extent that 
members of a group wish to participate in 
the industrial and post-industrial society, 
they have to have the skills that society 
utilizes. Are the distinctions inevitable? 
Some professors and some politicians have 
proclaimed, loudly, that they know the 
answer to this question. However, those 
people who are so certain seem to disagree 
rather vehemently about whether the 
answer is “yes” or “no.” I do not expect them 
to agree with each other, any more than 
I expect that the Pope and Shiite Islam’s 
Grand Ayatollahs will agree on the nature 
of God. 

It could be that there are genetic con¬ 
straints that make inequality of cognition 
across groups inevitable. This hypothesis 
can never be ruled out, for doing so would 
require proving the null hypothesis and, as 
any good statistics instructor will tell you, 
that is a logical impossibility. It is worth 
remembering that no genes related to the dif¬ 
ference in cognitive skills across the various 
racial and ethnic groups have ever been dis¬ 
covered. The argument for genetic differ¬ 
ences has been carried forward largely by 
circumstantial evidence. Of course, tomor¬ 
row afternoon genetic mechanisms produc¬ 


ing racial and ethnic differences in intelli¬ 
gence might be discovered, but there have 
been a lot of investigations, and tomorrow 
has not come for quite some time now. 

A number of social and environmental 
causes for racial/ethnic differences in intelli¬ 
gence have been proposed. Some refer to the 
physical environment: health, nutrition, and 
exposure to atmospheric toxins. Some refer 
to the social environment. These include 
constraints imposed by the general society, 
such as lack of adequate schools (particu¬ 
larly a problem in developing countries, but 
certainly a problem in the industrially devel¬ 
oped ones) or discriminatory practices. In 
other cases the constraints are imposed by 
the group’s own mores. Examples would 
be having more children than a family can 
raise effectively, differential discrimination 
within a group (think of fundamentalist 
Islam’s view of women’s education), and 
attitudes within a peer group toward short¬ 
term achievement as opposed to long-term 
achievement in education. 

The causes of differences in cognition 
between old and young, men and women, 
and various racial/ethnic groups should be 
investigated. We have made legal and prac¬ 
tical distinctions between these categories 
in the past, we do so now, and we proba¬ 
bly will do so in the future. Retirement reg¬ 
ulations, antidiscrimination policies, social 
support for mothers and their children, and 
different forms of affirmative action are all 
part of a rational society. Demographic dif¬ 
ferences in intelligence are relevant to these 
policies, regulations, and programs. It is best 
if science informs policy makers, so inquiry 
is appropriate. On the policy makers’ side, 
scientists should not be restricted in their 
inquiries because the results might be incon¬ 
venient. On the scientist’s side, the results 
must be fully and honestly reported, regard¬ 
less of the scientists’ personal beliefs about 
social policy. 


CHAPTER 12 


Summary and Prospectus 


It’s tough to make predictions, especially 
about the future. 

- attributed to Yogi Berra, 
American sports figure 1 

A colleague of mine once observed that the 
ending sections of all scientific discussions 
could be collapsed to a single word - much. 
Much has been learned and much remains 
to be learned. That is certainly true of intel¬ 
ligence. But let us be more specific. 

12.1. A Summary 

Scientific research on intelligence is just over 
a century old. There has been progress; it 
just has not been as rapid as we might have 
hoped. A great deal of this research has 
come in spurts. As in many other sciences, 
these spurts have occurred when techno¬ 
logical or historic developments open up 
new sources of relevant data. In research on 
human intelligence that has happened three 

i The quotation has also been attributed to the physi¬ 
cist and Nobel Prize winner Niels Bohr. 


times. The development of group testing 
initiated by the Army Alpha examination 
in World War I showed that valid intelli¬ 
gence measures could be obtained without 
expensive one-on-one interviews. The result 
was an explosion of applications, and data, 
in both academic and industrial settings. In 
the 1930s the development of mathemati¬ 
cal techniques for handling multivariate data 
made possible the evaluation of sophisti¬ 
cated psychometric models of human intelli¬ 
gence. The impetus of this development was 
magnified with the development of comput¬ 
ers, which made computation-heavy meth¬ 
ods of analysis possible. Since about 1985 
we have been busy digesting data from a 
new source - the extensive neuroscientific 
measurements that are now possible, These 
include noninvasive techniques for looking 
directly at indicators of functioning in the 
normal brain. So what have we learned from 
all this? 

Chapter 1 presented a view of intelli¬ 
gence as a concept, and as it is assessed 
by IQ and related tests. I argued, in agree¬ 
ment with several other authors (espe¬ 
cially Robert Sternberg, but with echoes of 
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Raymond Cattell), that intelligence should 
be thought of as the collection of cogni¬ 
tive skills and knowledge required to achieve 
success in one’s society. Some of these skills 
are common to all human societies, and 
some of them have varying degrees of rele¬ 
vance depending on the society. Our present 
cognitive tests evaluate some of the cogni¬ 
tive skills and knowledge needed in mod¬ 
ern industrial and post-industrial societies. 
Without making any statement about the 
moral values involved, the practical fact is 
that these skills are important worldwide, 
for there are far more societies interested in 
becoming industrial or post-industrial cul¬ 
tures than there are developed societies 
that want to revert to farming and herd¬ 
ing, let alone to being gleaners and hunter- 
gatherers. 

Chapter 2 introduced the tests. My exam¬ 
ples were certainly not exhaustive, but I 
think that they will give readers an idea of 
what the tests are like. In that discussion I 
argued strongly, and not for the only time, 
that the biggest restriction on intelligence 
testing has been an acceptance of intelli¬ 
gence as something that has to be demon¬ 
strated within the “Drop in from the Sky" 
test paradigm. 

In Chapter 3 I presented an overview of 
the distinction between behavioral theories 
of intelligence and theories based on find¬ 
ings in the neurosciences. I concluded that 
there is room for both of them. Chapter 4 
expanded on this by discussing the concepts 
of psychometric analysis and the models of 
intelligence that have been developed using 
psychometric tools. I made the (somewhat 
controversial) assumption that the theory 
you want depends upon what you want to 
do with it. If the goal is to understand how 
intelligence is derived from biological vari¬ 
ables, then the g-VPR model is useful. If the 
goal is to understand how intelligence is used 
in society, then the g and three-stratum Gc- 
Gf models have much to recommend them. 
The two goals are equally legitimate and can 
be explored with equal scientific rigor. 

Chapter 5 presented theories that are out¬ 
side the psychometric mode. I concluded 
that, in spite of the publicity that it has 


generated, there is little empirical support 
for Howard Gardner's Multiple Intelligence 
theory. I take no stand on how useful it is as 
an inspiration for educators. Robert Stern¬ 
berg’s Theory of Successful Intelligence is 
rooted in empirical data, and has clear con¬ 
nections to the use of information about 
intelligence in society. Psychometric theo¬ 
ries are silent on this point. While many of 
the individual studies in support of Stern¬ 
berg’s theory are, in my opinion, not as 
strong as they have been made out to be, 
the overall support cannot be disregarded. A 
special strength of this theory is that it can 
be expanded to evaluating behavior outside 
of the testing situation. 

In that chapter I also shed a small tear 
for J. P. Das's PASS model of intelligence, 
which may not have received the attention, 
evaluation, and expansion that it should 
have. The reasons seem to me more historic 
and social than intellectual. 

Chapter 6 discussed the relationship 
between behavioral measures of human 
information processing and intelligence, as 
defined by test scores. It was concluded 
that there are two major components of 
the information-processing underpinnings 
of intelligence: cognitive processing speed 
and individual differences in the function¬ 
ing of the working memory-control of atten¬ 
tion complex. I questioned whether it is use¬ 
ful to try to break down these relationships 
more finely. I think it is not; others may 
disagree. 

Chapter 7 discussed the brain structures 
shown to be involved in intelligent action. 
It is clear that there is no “center of intel¬ 
ligence” in the brain. Intelligent behavior 
depends upon the functioning of a coordi¬ 
nated system involving many parts of the 
brain. Certain regions are more involved in 
cognitive behavior than others. The frontal 
lobe and parts of the parietal lobe are espe¬ 
cially prominent in cognition. But the whole 
brain is needed. And yes, bigger-brained 
people are smarter, and smarter people have 
brains that function more efficiently than 
less-smart people. This is true in the literal 
sense of efficiency; more intelligent people 
show less brain metabolism, not more, than 
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less intelligent people when both are pre¬ 
sented with the same problem. 

Chapter 8 discussed the very strong evi¬ 
dence for a genetic contribution to intelli¬ 
gence, from Quantitative Behavior Genet¬ 
ics, and the lack of evidence from Molecular 
Genetics that could identify the gene- 
intelligence link. This is a point where time 
will tell. We need more studies of the molec¬ 
ular genetic link. We do not need any more 
studies trying to tie down the value of her- 
itability coefficients in the general popula¬ 
tion. It was important to determine that the 
heritability coefficient is not zero and that 
it is not one. Where does it fall in between? 
That number will vary over time and place. 

Chapter 9 turned the argument around, 
by presenting evidence for a myriad of 
physical and social environmental variables 
that affect intelligence. These include atmo¬ 
spheric lead, how much alcohol your mother 
consumed when you were in utero, and how 
good and how extensive your schooling was. 
The point was made that intelligence devel¬ 
ops with cognitive engagement; some peo¬ 
ple acquire wisdom as they go through life, 
others just have experiences. 

Chapter 10 presented the benefits of intel¬ 
ligence, including its influence on educa¬ 
tional achievement, social achievement, and 
such mundane things as income and occupa¬ 
tion. Contrary to widespread beliefs, intelli¬ 
gence does not just predict academic suc¬ 
cess. It predicts workplace and academic 
success about equally well. The prediction 
is not perfect, but it is better than any 
other predictor we have today, most defi¬ 
nitely including personality tests. It was also 
pointed out in this chapter that there is no 
support whatsoever for the idea that the 
very intelligent are social nerds, or that once 
you have an adequate supply of intelligence 
other personality traits become more impor¬ 
tant. Intelligent people, as indicated by test 
scores, tend to be successful people, and 
the more intelligence you have, the brighter 
your prospects. 

Chapter 11 discussed group differences 
in intelligence. As we grow older we grow 
slower, in a literal sense, and have less con¬ 
trol over the attention-working memory 


complex. That is the bad news. The good 
news is that older people know more. Elders 
may solve by memory a problem that the 
young have to work out for themselves. Rec¬ 
ollection and analogy to previously solved 
problems are powerful aids in thinking. 

There are differences in the ways that 
men and women think, but they lie much 
more along a verbal-spatial reasoning axis 
than in general intelligence. For some 
unknown reason, though, men are more 
variable in their general reasoning powers 
than women are. 

With respect to racial/ethnic and national 
distinctions: yes, there are considerable dif¬ 
ferences between some groups in the pos¬ 
session of general cognitive skills, the g 
dimension of intelligence. These differences 
are related to the socioeconomic and edu¬ 
cational achievements of group members, 
on both the national and international lev¬ 
els. While some environmental causes for 
group differences have been uncovered, 
their effects are insufficient to account for 
the observed differences between groups 
and nations. Arguments have been made 
assigning the causes to genetic differences, 
but no example of an actual genetic mecha¬ 
nism causing group differences has ever been 
found. 

I closed Chapter 11 with a brief discus¬ 
sion of the problems involved in studying 
this topic in the context of understandably 
heated social concerns about the existence 
of, or implications of, racial and ethnic dif¬ 
ferences in intelligence. 

That is where we have been. Now where 
are we going? 

12.2. The Future 

Yogi was right, predicting about the future 
is difficult. I will try anyway. 

Science advances when one of three 
things happens. A scientific discovery may 
open up new sources of relevant data. 
Developments in solid-state, low-tempera¬ 
ture physics led to today’s imaging technol¬ 
ogy, which opened a huge new source of 
data for the neuroscientists. Or a new idea, 
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perhaps imported from another field, may 
bring about a new way of looking at known 
facts. Darwin’s Theory of Evolution, devel¬ 
oped after studying finches, turtles, and bee¬ 
tles, has profoundly altered the way we think 
about differences in behavior between men 
and women. Then there is the third way to 
change. Sometimes a genius comes along, 
someone who arrives at a time when the 
old ideas are not quite working and who has 
the creativity and breadth of thought to put 
everything together. Newton, Darwin, and 
Einstein qualify, but there aren't that many 
other examples at their level of accomplish¬ 
ment. 

There are two reasons for studying intel¬ 
ligence: to understand how individual dif¬ 
ferences in cognitive capability come about, 
and to understand what those differences 
mean in our society. The two purposes are 
similar in two ways. They are both legiti¬ 
mate scientific goals, and they both require 
some way of describing variations in intelli¬ 
gence. 

I think that our present descriptions of 
intelligence are pretty good. After all, they 
are the culmination of over a century of 
careful thought. It turns out that if you 
want to look down from intelligence to the 
brain, the g-VPR theory may be best; if you 
want to think about how intelligence is used 
in society, the g model, some version of 
the three-stratum theory, or of Sternberg's 
Theory of Successful Intelligence may be 
most useful. This will depend upon the con¬ 
text in which the study is conducted. Note 
the difference. Understanding how the brain 
produces individual differences in cognition 
is a single goal, and a single description may 
be appropriate. Understanding how intelli¬ 
gence is used “in society” involves disparate 
goals for disparate situations, and differ¬ 
ent descriptions may be needed. I do not 
find this particularly bothersome. I do find 
Procrustean attempts to make one model fit 
all situations a bit annoying. 

What about new sources of data? With 
respect to the brain-mind connection we 
have every reason to be optimistic. Advances 
in imaging, biochemistry, and molecular 
biology have and will continue to open up 
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huge new fields of data for psychological 
inquiry. The biggest constraint on advances 
here may be the expense involved. The 
equipment and techniques are very expen¬ 
sive now, but that may change. One thing 
will not change. Studies of individual differ¬ 
ences require lots of people, committing a 
lot of their time. Time is money. 

What about new data on the social devel¬ 
opment and uses of intelligence? I am a 
bit optimistic here, even though spectacu¬ 
lar new machinery like that used by neu¬ 
roscientists is not on the horizon. There 
are two things that have to be done. First, 
we must break down the artificial boundary 
between the study of personality and the 
study of intelligence. In the world, the two 
work together. There are examples where 
this is already being done, notably in the 
work of Ackerman and in the major longitu¬ 
dinal studies, such as the Seattle Longitudi¬ 
nal Study and the Study of Mathematically 
Precocious Youth. I am optimistic that this 
sort of research will develop. But then comes 
the next issue. 

We have to break out of the rigid 
paradigm of thinking of intelligence as some¬ 
thing to be measured only in a formal test¬ 
ing session, the “Drop in from the Sky” 
paradigm. That paradigm has its uses, and 
it has had its day. We may be on the edge of 
a technological breakthrough that will allow 
us to do just this. In the modern world peo¬ 
ple leave a great many electronic tracks as 
they move about. Banking and credit com¬ 
panies know a great deal about people's 
movements and spending habits. Central¬ 
ized medical records are already widespread 
in Europe, and the United States is mov¬ 
ing toward creating such databases. Studies 
have already been reported in which peo¬ 
ple’s movement through space was tracked 
by analyzing records of calls on cell phones. 2 
Special electronic badges have been used 
to record nonverbal communication during 
social mixers and business meetings. 3 The 
resulting data banks contain a tremendous 
amount of information that, potentially, 

2 Song et al., 2010. 

3 Pentland, 2010. 
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could be used to construct models of how 
intelligence is developed and used in the 
world outside of both the laboratory and 
the testing paradigm. The result might 
be breakthroughs in our understanding of 
the social aspects of intelligence that will 
rival the breakthroughs achieved by the 
use of imaging technologies and genet¬ 
ics to understand the biological aspects of 
intelligence. 

The use of central databases and tracking 
technologies raise serious issues about pri¬ 
vacy and accountability. If society sees suf¬ 
ficient value in expanding our knowledge of 
how cognition is used in everyday life, then 
these problems can be solved. Will they be 
solved? That depends on how well those 


interested in the study of intelligence make 
their case. 

Then there is that last way of advancing - 
finding the genius who puts it all together. 
What we need is that person who will bring 
his or her one-in-ten-million intelligence to 
bear on the problem. According to the lat¬ 
est Digest of Educational Statistics, approx¬ 
imately 700,000 people graduate from US 
four-year institutions annually. Let us dou¬ 
ble this to allow for geniuses outside the 
United States. We can expect the necessary 
genius to obtain a bachelor’s degree some¬ 
time in the next ten to fifteen years. Then 
the trick will be to interest them in the 
Psychology of Intelligence. 

I hope this book will help. 
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250, 251, 253, 264, 265, 267, 270, 272, 279, 
281, 285, 286, 289-90, 299, 300, 310, 317, 

319, 332, 334, 336, 337, 341, 343, 345, 349, 

377 / 37 8 / 3 8l < 3 8 4 / 3 8 5 / 4 ° 4 / 4 ° 7 . 4 * 3 » 4 “/ 

423, 424, 432, 433, 444, 445 
defined 6 

international values compared 438-40, 

44 1 

Intelligence testing 14, 16 
Intelligent Design 64 
Interests 330, 402 
International comparisons of 
intelligence 436-45 

International Mathematics and Science 
Studies (TIMSS) 421 
Interview 
structured 332 
unstructured 331 
Inuit 127 
Iraq 377, 378-82 
Iron 277 

Item independence 60 

Item response theory (IRT) 52, 55, 57, 363, 

368, 369 

Item selection 51, 52 

Japanese-Americans 421 
Jefferson, Thomas (US President) 2, 31, 185, 
436 

Jews, Jewish 414 

Jindal, Bobby (Governor, US State) 431 
Job performance (workplace 

performance) 16, 68, 91, 119, 122, 135, 324, 
329, 330, 337, 350, 416, 420-21, 426 
Job requirements 339 
Job status 342 

Johnson, Lyndon (US President) 172 
Johnson, Tim (US Senator) 140 
Judgment of motion 389, 403 


K-12 school system 312, 319-21, 322, 324, 

392-93, 417 
drop-outs from 312 

Kaufmann intelligence test (KAIT) 34-36, 91, 
100, 306 

Kennedy, Edward (US Senator) 28 
Kennedy, John (US President) 28 
Kennedy, Joseph 28 
Kennedy, Robert (US Senator) 28 
Kindergarten 117, 430 
Kleinefelter's syndrome (XXY) 251 
Knowledge 20, 96, 100, 108, 127, 131, 132, 133, 
139, 171, 268, 297 
academic 267 
culture-specific 131, 444 
domains of, in PPIK model 132 
job-relevant (including military duties) 329, 
33 h 33 2 

specialized (Gk) 131-33, 139 
Korsakoff’s syndrome 278 
Kuru 275 

Language 21, 27, 47-48, 73, 115, 193, 251, 321, 

39 h 39 2 

Larry P. v. Riles (court case) 422, 423 
Latent variables 18, 19, 158 
Latin-American 18, 439 

Latinos (ethnic group) 345, 348, 408, 410, 411, 
412, 413, 417, 420, 423, 424, 425, 426, 427, 
428, 429, 431, 432, 434 
definition of 411 

Law School Entrance Examination 
(LSAT) 328 
Lead 280, 432 

as atmospheric pollutant, see pollutants 
blood concentration levels 282 
contained in toys 280 
use in gasoline 280 
Leadership 126, 127 

Learning (practice) 200, 242, 401, 402, 403, 

406, 427 
Lecithin 257 
Lewinsky, Monica 95 
Lexical identification 147-51, 153, 155, 161 
Lexical retrieval 99 
Life, length (life expectancy) 14, 438 
Limbic system 137, 174, 175, 182, 188, 196, 

278 

Lincoln, Abraham (US President) 1, 61, 131 

Linguistic performance 242 

Linkage analysis 252, 283 

Literacy, see reading 

Literacy rate 437, 438, 440 

Living skills, changes in, with age 374 

Locke, Gary (Governor, US State) 431 
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Logic 308 

Longitudinal study 133, 261, 366, 375 
Low-normal intelligence 349-53, 354 

Madoff, Bernard 300 

Magnetic resonance imaging (MRI) 177, 179, 
180, 184 
Malaria 212 
Male/female ratios 
in SMPY 348, 384 
in special education programs 384 
in student bodies 327, 381 
Male-female differences (sex differences) 48, 
109,174,194,355,357,359,387,388 
in brain organization 404, 405 
in education, educational topics 391, 393, 
394“97 

in general intelligence 375-82, 384, 386, 404, 
4 ° 7 , 446 

in high-level test scorers 385, 393 
in variability of test scores 382-83, 395 
international comparisons 393, 401 
Male-female differences, causes of 398-406 
biological 402 
social 400-402 
Malnutrition 274, 432 
long-term 276 
short-term 275, 276 

Managerial/professional positions (reference 
executive, professional) 332-34 
Manic-depressive psychosis (bipolar 
disorder) 343 
Manifest variables 18, 19 
Map reading 390 
Marianas Islands 124 
Masai (African tribal group) 377 
Master of Business Administration (MBA) 126 
Maternal intelligence, see parents, parental 
Mathematical skill 106, 240, 242, 252, 347, 391, 
392,446 

Mathematicians 305 

Mathematics 240, 241-397, 398, 400, 401, 419, 
421, 426, 428, 442 

Mating practices, see assortative mating 
Maturation rate, sex differences in 378-413, 

4 U 

McCain, John (US Senator) 162, 165 
McNamara, Robert (US Secretary of 
Defense) 350 

Measurement invariance 404 
Medicine 20 
Memory 2, 98, 106, 174 
conscious 104-105 
content 236 
declarative 182, 196 


episodic 196, 199 

for content of written passage 388 
for shapes (MS factor) 166 
immediate (short-term) 91, 96 
implicit 196 

long-term 77, 105, 143-52, 165, 196, 249, 305, 

3 6 7 > 374 
procedural 196 
semantic 196 

short-term (storage) 96, 99, 132, 144, 147, 

149, 155, 165, 210, 239, 249, 306, 367, 372, 
406, 415 

unconscious 105 

working 73-75, 77, 84, 105, 110, 144, 145, 147, 
155-60, 163-64, 170, 175, 189, 190^2, 200, 
238-39, 270, 271, 272, 276, 277, 278, 304, 

3° 8 , 353 . 37 °. 372 , 374 . 433 . 449 . 45 °; 
training of 306-307 

Memory scale (on Wechsler tests) 34 
Memory span task 74, 157, 158 
reading span 74-75 
Men (boys) (in context of specific 

male-female comparisons) 336, 337, 345, 
348, 350, 355, 356, 359, 362-63, 364, 380, 
387, 388, 389, 391, 395, 396, 403, 404, 426, 
446 

eminent 375 

Mendel, Gregor 204-205, 206, 209, 210, 283 
MENSA 8 

Mental calculation (arithmetical) 305 
Mental disability or retardation 13, 57, 153, 

161, 204, 247, 248, 250, 251, 255, 276, 342, 
343 . 3 8 5 . 422, 439 

Mestizos (Latin American group) 412 
Meta-analysis 318, 323 
Metabolism 190, 198 
Meta-inventions 283, 309 
Mexico 412, 413 
MICROCEPHALIN gene 252 
Middle school 28, 92, 103, 117, 132, 319 
Military 2, 13, 20, 28, 39, 139, 293, 299, 333, 350, 
35 1 

Military enlistees 157, 276, 319, 411 
Vietnam War veterans 414 
Military performance, dimensions of 
discipline 329 
fitness and bearing 329 
general military proficiency 329 
leadership 329 
technical proficiency 329 
Military performance, general 350, 378 
Miller Analogies Test 33, 333 
Milne, A. A. 183 
Mind-set 136 

Mini Mental Status Examination 33 
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Minnesota Study of Twins Raised Apart 
(MISTRA) 107, 233-35, 2 3 6 - 3 8 7 , 3 88 
Minnesota Trans-racial adoption study 
(MTRA) 231, 285 
Minority groups 18 
Mislevy, Robert 12 
Mitochondria 246 

Mixed-race groups 412, 434, 439, 443 
Moray House Test 14 
Mortality 15 

Motivation 30, 139, 396, 402, 426 
Motor processes 151 
Motor skills 182 
Movement time 150, 153 
Mozart effect 274 

Multidimensional Aptitude Battery (MAB) 91 
Multigroup confirmatory factor analysis 364, 
365,378,413,415,426 
Multiple intelligences, theory of 114 

bodily/kinesthetic intelligence 114, 115, 116, 
449 

interpersonal intelligence 116, 118 
intrapersonal intelligence 116 
linguistic intelligence 114, 115, 116, 118 
logico-mathematical intelligence 116 
musical intelligence 114 
naturalistic intelligence 116 
spatial intelligence 116 
Mutualism 96 
Myelin 175 

Name identification 148 
Napoleon (Napoleon Bonaparte) 297 
National Assessment of Educational Progress 
(NAEP) 267, 383, 392, 403-17, 418, 421 
National Association for the Advancement of 
Colored People (NAACP) 411 
National cognitive skill 

as a general factor in international 
scores 442 

index of, related to other variables 442-43 
National Council of Educational Statistics 39 
National differences in intelligence 357, 359 
National Education Association (NEA) 39 
National Football League 46 
National Longitudinal Study of Youth 
(NLSY) 292, 318, 339 

NLSY79 292, 293-94, 299, 319, 320, 323, 336, 
350, 379, 420 

NLSY97 2 9 2 / 3 22 / 379 / 3 8 7 

NLSY Children and Young Adults 
Study 292 

Native American, see Amerindian 
Natural kind 104, 116 
Nature-nurture debate 66-68 


n-back task 306 

Netherlands, the 414 

Neural activity 176 

Neural conduction 109 

Neural density 199 

Neural efficiency 254 

Neural plasticity 109, 198-99 

Neurons 175 

Neuropsychology 176 

Neuroscience 77 

New Guinea 275 

Newton, Isaac 65, 451 

No Child Left Behind Act 313 

Normal (Gaussian or Bell) distribution 6, 7 

Norming 56 

Nostradamus 65 

Number (numerical) facility 98, 99, 336, 340, 
341 

representation in brain 305 
Nutrition 260, 270, 273-77, 28 3 / 2 &7> 2 9 ^/ 3°7/ 
309, 434, 439 

0 *NET 340 

Obama, Barack (US President) 280, 445 
Occipital cortex 37, 174, 180, 181, 193, 199 
Occupation 284, 287, 338, 428, 450 
blue-collar 351 
classes of 337, 340 
managerial/professional 351 
status 339 
white-collar 351 
Odds ratio 328 

Officers Qualifying Test (OQT) 23 
Old age 9 

Orientation, orienting ability 

(navigation) 166, 167-69, 195, 199, 200, 
403, 405, 407, 443 

Otis-Lennon Test of School Ability 40-41, 320 

Panel Study on Income Dynamics (PSID) 428 
Parameters 

fixed, definition of 222 
free, definition of 222 

Parenting practice 260, 287, 294, 428, 432, 435 
Parents, parental 283, 284 
adopting 230, 231 
biological 229, 230, 231 
intelligence 69, 270, 272, 279-81, 285, 287, 
292-94 

Parietal cortex 37, 174, 180, 181, 184, 185, 188, 
190, 193, 195, 198, 199, 200, 202, 306, 372, 
373 / 449 

Parieto-frontal integration theory (P-FIT) of 
brain function 190, 191, 202 
Parkinson's disease 271, 375 
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Pascal, Blaise 27 
Path analysis 221 

Path coefficients, definition of 221 
Pattern recognition 109 
Pedigree study, definition of 205 
Peers, peer group 308, 447 
Penetrance, definition of 216 
Perceptual skill 72, 106 
Perceptual:motor factor 106, 334, 341-52 
Performance IQ (PIQ) 31, 34, 188 
Personality 22, 30, 122, 132, 134-37, 139, 329, 

33 °; 354 ; 45 ° 

Personnel selection and classification 13, 16, 

18, 48, 56-59, 77, 110, 324, 331, 380 
in college entrance programs 402 
racial/ethnic discrimination in 425 
sex discrimination in 380 
Phenotype, definition of 205, 219 
Phenylketonuria (PKU) 248, 250, 255 
Phi Beta Kappa 12 
Phrenology 186 
Physical fitness 10 
Physical identification 148 
Physics, teaching of 303-304 
Picasso, Pablo 61, 343 
Planning, Action, Simultaneous, Serial 
(PASS) Model 37, 449 
Policy issues 62, 291, 296, 358, 409, 416, 417, 
4 2 5 ; 447 

Political development (of country) 443 
Pollutants 270 

Kosovo study 281, 288 
lead 279-309 
mercury 279 
Port Pirie study 281 
Polygenetic model 212, 251 
Polymorphisms, defined 253 
Polynesian navigators 297 
Positive manifold 80, 87, 91-94, 97, 109, 110-15 
Positron emission tomography (PET) 178, 

179, 180, 189 

Postgraduate education 327-28, 333, 397 
Post-secondary education (college, university, 
graduate) 297 

Post-secondary system of education 322 
Poverty 427 

Power (of statistical tests) 315, 317, 361, 379 
Practical intelligence 121, 122, 123, 125, 126, 128, 

129 ; * 3 °; 3 01 ; 33 2 ; 373 
Practice 154, 400, 446 
Precocity 

mathematical 346 
verbal 346 

Predictive validity 330, 331, 354 
Premature birth 272 


Pre-school programs 69, 274, 296 
Presidency, American 218, 343 
Prestige 336, 337 

Primary abilities model of intelligence 97, 

107, 264 

Prison (jail) 349 
Problem solving 2, 393 
Procedural knowledge 122 
Processes, personality, interests, and 
knowledge (PPIK) model 130-34 
Program for International Student Assessment 
(PISA) 392, 393, 394, 401, 421, 437, 441 
Progressive Matrix item format 46, 47, 48, 50, 
51, 52, 96, 160, 161, 188, 191, 266, 267, 297, 
302,303,329,380,396,397,405,425 
Project 100, 350, 351, 352, 353 
Prospective study 318, 334, 336, 343, 350 
Protein 277 

Proximal influences 68, 215 
Psychometrics 11, 52, 72-73, 76, 77, 79-80, 91, 
120, 130, 138, 142, 143, 153, 166, 199, 238, 

2 39; 242, 367, 371, 386, 389 

Quality of Human Conditions (QHC) 
index 438 

Quasi-experimental design 360 
Quiz Kids 344 

Race track handicapper study 317 
Racial/ethnic differences ‘gap” 235, 417, 443, 
447 ; 45 ° 

biological/environmental causes for 432 
genetic causes for 433-35, 443 
in educational achievement 415-20 
in intelligence and test scores 256, 355, 357, 
359; 404, 407, 411-13, 414-15, 433 
Jensen-Rushton “default hypothesis” 
for 434-35 

social causes for 425-32 
Racial/ethnic groups 18, 255, 356, 408, 409, 

439 ; 443 ; 44 6 ; 447 ; 45 ° 

RAINBOW study 125,126 
Ramanujam 354 

Range restriction, effect of 314, 315, 316, 320, 

3 21 ; 33 ° 

Ratings 

supervisor ratings of employee 
performance 312, 420 
teachers’ ratings of students 240, 241 
Raven Progressive Matrices Test (RPM) 46, 

51,56,91,95,96,190,302,330,378, 

380-82,411,413,427 

Advanced 33, 46, 189, 190, 284, 303, 322, 333, 

3 Sl 

Colored 46 
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Raven Progressive Matrices Test (RPM) [ cont .) 
Standard 46, 266, 268, 303, 321, 413 
standardization study of 268 
Reaction range 24, 25, 26, 37, 258, 286, 307, 437 
Reaction time 154, 372, 433 
Reading 20, 27, 47, 68, 132, 193, 240, 241, 442 
Reagan, Ronald (US President) 2, 364 
Reasoning, 40, 100, 132, 145, 158, 188, 190, 199, 
200, 263, 297, 300, 302, 303, 304, 334, 335, 
34°, 341, 360, 391, 415 
abstract 2, 50, 115, 267, 272, 278, 387 
American vs. Asian style of 50 
auditory 367 
counterfactual 431 
deductive 34—38, 49-50, 99 
inductive 34, 46, 49-50, 95, 98, 99, 263, 300, 
369 

logical 161 

mathematical 48, 143, 161, 192, 283, 357-59, 

399/ 431 
nonverbal 271 
quantitative 94, 95 
scientific 308, 399 

verbal [see also verbal ability) 96, 127, 143, 
H5> l8 9« *99< 200 < 2 5°/ 2 7 8 . 34°< 37°/ 3 8 7> 
4H 

visual-spatial 49, 145, 174, 189, 192, 199, 200, 
263, 278, 308, 367, 370, 380, 387, 389, 401, 
406, 414, 421, 446 
Recessive trait, definition of 205 
Recommendations 3 

Recruitment effect 359, 380, 410, 411, 412, 437, 

439 

Reductionism 71-72 

Rejection rate (in personnel selection) 325, 331 
Reliability (statistical concept) 154, 188, 

313-14, 316, 317, 318, 330 
Replacement rate 208 
Research design 318 
Retrospective study 319, 343 
Rice, Condoleeza 28 
Robert Bruce (King of Scotland) 184-85 
Roberts, John (US Chief Justice) 283 
Robin Hood 300 
Rodriguez, Alex 283 
Roles in society, men compared to 
women 377, 400, 401, 402, 446 
Roma (European ethnic group) 412 
Roosevelt, Franklin (US President) 39, 131, 218 
Roosevelt, Theodore (US President) 218, 298, 
300, 326 

Rotation (of visual images) 72, 106-109, *69, 

3°5> 3 6 7« 3 8 9. 39° 

Rousseau, Jean-Jacque 61 
Russia 127, 131 


Saint Paul (Saul of Tarsus) 65 

Salem witch trials 275 

SAT 9, 14,15,17, 21, 23, 32-33, 39, 41-43, 48, 

50-5 1 / 54. 56/ 5 8 - 59/ 61, 62, 73/103-104, 
120, 121, I25, 126, 135, l6o, 164, 217, 219, 265, 

z 8 4/ 2 97/ 3°i/ 3 02 / 313/ 3 22_2 4/ 3 2 5/ 3 26 ~ 2 7/ 
333/ 34 6 / 359/ 3 6 °/ 3 8 5/ 395~97/ 41 8 / 419. 
421, 422, 424 

Saturated model, definition of 223 
Schizophrenia 343 
Scholastic ability 40 
Scholastic Aptitute Test, see SAT 
School, schooling 28, 114, 117-18, 277, 447 
Science, knowledge of 241-398 
Science, Technology, Engineering, 

Mathematics (STEM) 392, 395-98, 399, 
400, 402 

Seattle Longitudinal Study 263, 265, 268, 318, 
367, 369, 370, 371, 373, 375, 451 
Segregation (of racial-ethnic groups) 355 
Selection pressure, 208, 210, 416, 445 
driven by climatic conditions 444 
Selection ratio (complement of rejection 
rate) 331 

Selection restriction (in personnel 
classification) 314, 315 
Self-description 137 

Self-discipline (personality variable) 135-36 
Semantic identification 147 
Semantic priming 163 
Sensory processes 151 
Sentence comprehension 99, 141, 142 
Sentence verification paradigm 195 
Serb (European ethnic group) 412 
Serial processing 147 
Sesame Street (TV program) 283 
Sex chromosomes, definition of 211 
Sherlock Holmes 49, 238 
Shumer, Charles (US Senator) 162 
Siblings (SS) 220, 224-25, 228, 238, 243 
Sickle-cell anemia 212 
Simpler reaction time 371 
Single nucleotide polymorphism (SNP) 247, 
2 53 

Sioux (Lakota) 3 

Sitting Bull 3 

Situation awareness 156 

Situation model 74, 161-62, 165, 169 

Situational judgment test 333 

SNAP-25 gene 2 5 1- 5 2 

Social achievement 21 

Social beliefs, see policy issues 

Social complexity, changes in 308, 310 

Social drinking 278-79 

Social relevance 22 
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Socioeconomic status (SES) 15, 16, 17, 25, 62, 
68, 97, 217, 219, 220, 226, 230, 231, 232, 233, 

2 43 , 244 / 2 55 « 25 8 - 6o > 2 7 °> 2 79 - 2§ 3 < 28 4 - 
286, 287, 288, 29O, 29I, 294, 295, 298, 3O7, 
318, 319, 326-27, 337, 345, 349, 360, 414, 

421, 427-28, 430, 432 

Soldiers (military personnel) 262, 268, 329-30 
South Africa (country) 412 
Span tasks [see also digit span) 156-57 
memory 157 
processing 157 
reading 163, 189, 239 
Spatial orientation 21, 263, 390 
Spatial skills 340-50, 403, 415, 446 
Spearman's hypothesis 363-64, 365, 414-15 
Special education 242, 384 
Speed 142, 146, 200, 304 

cognitive processing (Gs) 99, 152, 155, 304, 
305, 371-72, 373, 374, 387, 445, 450 
decision (Gt) 99 

information-processing 77, 96, 145-47, 1 54> 
155, 170, 189, 238, 433, 449 
of closure (CS) 166 
of rotation (SR) 166 
perceptual 98, 240, 371, 386 
Speed-accuracy tradeoff 151 
Spies 4 

Standard score 54 
definition of 6 
Standardization 11, 269 
Stanford, Allen 300 

Stanford-Binet 7, 12, 13, 32, 34, 40, 122, 265, 

269 

Stanine, defined 383 

Stereotype threat 357-59, 361, 402, 426 

Structural equation modeling 158, 221, 237, 

244 

Structure of intelligence model 106 
Study of Mathematically Precocious Youth 
(SMPY) 346-49, 354, 366, 374, 398, 399, 
400 

Styles (of thought) 129 

Successful intelligence, theory of 120-30, 449, 

45 1 

Sudan 28, 302, 303 
Summers, Lawrence 381-400, 415 
Supervisor ratings 122 
Syntax 75, 141 
System (of variables) 28 
closed 28 
open 28 
social 70 

variables external to (external or exogenous 
variables) 28, 29 

variables in (system variables) 28, 29 


System analysis 30, 68 

Tacit knowledge 122-24, l2 7> 12 9> ^8 
Taxicab drivers 195, 202 

Temporal cortex 37, 75, 174, 180, 181, 193, 197, 
198, 199, 200, 258, 270 

Terman study of gifted ('Termites") 344-46, 

349 , 354 . 366 

Test sophistication, test-taking skills 266, 302 
assessed irrelevant skills 423 
assessed relevant skills 423 
unassessed relevant skills 423, 426 
Tests (see also names of individual tests) 3 
appropriateness across racial/ethnic 
groups 361 

batteries of subtests 31, 40-44, 302, 321, 377, 
380, 407, 411 
British Empire's use 4 
Chinese Empire's use 3 
cognitive 2, 8, 13 
g-loaded 357 
gender bias on 378 
group 13, 31-38, 39-47 
individual 31, 32, 33-39 
information-processing 272 
intelligence (IQ) 3, 5, 8, 9, 12, 21, 23, 33, 
121-22, 266, 271, 284, 304, 308, 373, 446 
mathematics 387 
objections to 3,17, 120-21 
performance on 72, 241 
scores 133, 137, 154, 189, 201, 204, 220, 241, 
242, 254, 260, 261, 265, 266, 270, 293, 295, 
298, 301, 309, 319, 326, 334, 337, 339, 349, 
360, 407, 409, 421, 426, 431, 433 
single-format 32, 44 
situational judgment 122, 126, 129 
specific factor evaluated by test 407 
validity of 18, 128, 438 
verbal 272, 380, 421 
visual-spatial 380 
TETRIS (computer game) 198 
Texas adoption project 230 
Text model 73, 161, 169 
Theory 64-78 

competition between 66-68 
scientific 65, 77 

Three-minute reasoning test 44, 146 
Three-stratum model 99-106, 109, 138, 139, 
152,278,367,370,371,391,449,451 
first stratum 99 
second stratum 99, 100 
processing speed (PS) factor 389, 449 
third stratum [see also general 
intelligence, g) 100 
visualization (Vz) factor 389 
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trait complex 133 

conventional complex 134 
intellectual complex 134 
science and math complex 133 
social complex 134 
Truman, Harry (US President) 1, 2 
Tuddenham Study 262, 268 
Turner’s syndrome 250 
Twin studies 189, 232-35, 243 
Australian 238, 239 
Dutch 233, 236, 238, 239, 253 
Japanese 238 

Swedish 233, 235-36, 237, 240, 375 
Twins Early Development Study 
(TEDS) 240, 241, 242 
US Department of Veterans Affairs 239 
Twins 243 

adopted apart 234, 236 
dyzygotic (fraternal) 67, 220, 221, 224-25, 
226,228,232,233,236,237,243 
monozygotic (identical) (MZ) 25, 67, 220, 
221, 226, 228, 232, 234, 236, 237, 243, 272 
virtual 228 

University of California 62 
University of Michigan 50 

Validity 14, 18, 21, 58, 63, 426 

across racial/ethnic groups 358, 422-25 
internationally 439 
predictive, defined 312 
Variability in test scores, male-female 
differences 382-85, 450 
Variance ratio, defined 382 
Verbal ability (intelligence, skills) 21, 72, 75, 
94, 95, 136, 151, 155, 192-93, 268, 378, 403, 
446 

Verbal ability (linguistic skills) 21, 72, 75, 106, 
162,334,341,347,387,428 
Verbal comprehension 73-74, 100, 106, 146, 
I56, l60-66, 169, I76, l80, 189, 263, 3O5, 
369, 4O7 

Verbal fluency 406 
Verbal IQ (VIQ) 31, 34, 188 
Verbal relations (primary ability) 98 
Verbal:educational factor 106 
Verbalization 96 
Video games 308 
Virtual environment 168, 390 
Vision 37 

Visual analysis of static figures 390 
Visual detection 304 
Visual imagery 166, 167, 168, 185, 193, 387 
Visual recognition 96 


Visualization 95 
Visualization factor (Vz) 166 
Visual-spatial ability 32, 48, 60, 98, 136, 
166-70, 189, 193, 194-95, 382, 397, 400, 

426 

Gv 99, 102, 103, 136, 166, 167, 168, 194, 199 
training of 389 
Vitruvius 280 

Vocabulary, vocabulary tests 1, 2, 32, 33, 91, 
106, 118, 129, 163, 169, 193, 200, 258, 272, 

302, 367 

Washington, George (US President) 20, 131, 
?> l 9 > 343 

Watson, James 344, 416-17 

Wechsler Adult Intelligence Test (WAIS) 12, 

U, 34 ' 35 ' 3 b' 39 / 46, 91, 9 2 ' 94 ' 98, 103; 155, 
163, 182, 188, 193, 200, 201, 230, 236, 265, 

28 4 , 3 * 7 . 354 . 3 6 3 - 75 . 378 , 379 . 3 8 o, 383, 
386, 387, 407 

Wechsler Intelligence Scale for Children 
(WISC) 12, 34, 57, 98, 232, 279, 286, 288, 
380 

Wechsler Preschool Intelligence Test 290 
Wechsler tests 12, 13, 31, 32, 40, 56, 269, 319, 

377.378 

Weight 8 

Welfare (public assistance) 349, 354, 359 
Wernicke’s area 75, 175, 193 
White (racial-ethnic group) 18, 226, 231, 232, 
235,261,272,283,290,336,337,339,345, 

35°. 3 61 . 363, 364, 4°4. 4°8, 4°9. 4 10 . 

411-13, 415-19, 420, 421-22, 423-24, 425, 427, 
428-30, 432, 433, 434, 443, 446 
White matter 175, 187, 199, 404 
Williams syndrome 161 
Wilson, Woodrow (US President) 2 
Wisdom 24, 373, 374 
Wollstonecraft, Mary 376 
A Vindication of the Rights of Women 376 
Women (girls) (in context of specific 

male:female comparisons) 305, 334, 336, 

337 . 345 . 348 , 350 . 355 . 35 b. 359 . 362, 3 6 4 . 
380, 386, 387, 388, 389, 391, 395, 396, 403, 

4 ° 4 . 42b. 44 b 
eminent 375, 397 

Wonderlic Personnel Test (WPT) 44-46, 48, 
73, 118, 171, 330, 338, 420 
Woodcock-Johnson Intelligence Test 

(WJT) 34, 36, 100-103, 367, 368, 369, 370, 
37 1 ' 4 12 ' 4 U, 4 ^ 

Word fluency (primary ability) 98, 106 
Word (lexical) recognition 163 
Work sample 331 
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Workplace, cognitive requirements of 350 
Workplace performance 328-50 
Workplace rewards (for intelligence) 334 
World War I 10, 261, 262, 263, 280, 411, 

448 

World War II 235, 261, 262, 263, 265, 267, 275, 
276, 296, 298, 322, 334, 354, 411, 440 


Wundt, Wilhelm 89 

X-rays 176, 177 
XYY syndrome 250-51 

Yeltsin, Boris (Russian President) 131 
Yerkes, Robert 411 






















“Earl Hunt has written a book that is deeply informed by a scholarly knowledge of 
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